使用余弦变压器增强几幅图像分类

论文标题

使用余弦变压器增强几幅图像分类

Enhancing Few-shot Image Classification with Cosine Transformer

论文作者

Nguyen, Quang-Huy, Nguyen, Cuong Q., Le, Dung D., Pham, Hieu H.

论文摘要

本文解决了少数图像分类问题，其中仅在未标记的查询样本上执行分类任务，仅给出了少量标记的支持样本。少数花学习问题的一个主要挑战是各种各样的对象视觉外观，可防止支持样品全面表示该对象。这可能会导致支持样本和查询样品之间存在显着差异，因此破坏了少量算法的性能。在本文中，我们通过提出几乎没有弹药变压器（FS-CT）来解决问题，在该问题中，支持和查询之间的关系图可有效地用于几个射击任务。 FS-CT由两个部分组成，这是一个可学习的原型嵌入网络，可从具有硬案例的支持样本中获得分类表示，以及一个变压器编码器，可从两个不同的支持和查询样本中有效地实现关系图。我们引入了余弦的注意力，这是一个更健壮和稳定的注意模块，与默认规模的DOT-DOT-Prododuct机制相比，精度可显着增强变压器模块的准确性，从而将FS-CT的性能从5％提高到20％。我们的方法在1-Shot Learning和跨骨架和少量配置的1-Shot学习和5次学习任务中，在Mini-Imagenet，Cub-200和Cifar-fs中执行竞争结果。我们还为瑜伽姿势识别开发了一个自定义的几张数据集，以证明我们的算法在实际应用中的潜力。我们的FS-CT引起了余弦的关注，这是一种轻巧，简单的几算算法，可用于多种应用，例如医疗保健，医疗和安全性监视。我们的几次弹性变压器的官方实施代码可从https://github.com/vinuni-vishc/few-shot-cosine-transformer获得

This paper addresses the few-shot image classification problem, where the classification task is performed on unlabeled query samples given a small amount of labeled support samples only. One major challenge of the few-shot learning problem is the large variety of object visual appearances that prevents the support samples to represent that object comprehensively. This might result in a significant difference between support and query samples, therefore undermining the performance of few-shot algorithms. In this paper, we tackle the problem by proposing Few-shot Cosine Transformer (FS-CT), where the relational map between supports and queries is effectively obtained for the few-shot tasks. The FS-CT consists of two parts, a learnable prototypical embedding network to obtain categorical representations from support samples with hard cases, and a transformer encoder to effectively achieve the relational map from two different support and query samples. We introduce Cosine Attention, a more robust and stable attention module that enhances the transformer module significantly and therefore improves FS-CT performance from 5% to over 20% in accuracy compared to the default scaled dot-product mechanism. Our method performs competitive results in mini-ImageNet, CUB-200, and CIFAR-FS on 1-shot learning and 5-shot learning tasks across backbones and few-shot configurations. We also developed a custom few-shot dataset for Yoga pose recognition to demonstrate the potential of our algorithm for practical application. Our FS-CT with cosine attention is a lightweight, simple few-shot algorithm that can be applied for a wide range of applications, such as healthcare, medical, and security surveillance. The official implementation code of our Few-shot Cosine Transformer is available at https://github.com/vinuni-vishc/Few-Shot-Cosine-Transformer

下载PDF全文

下载文献需遵守相关版权规定

论文标题