论文标题
LASP:语言意识的视觉和语言模型的文本到文本优化
LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models
论文作者
论文摘要
软提示学习最近成为使用一些培训示例将V&L模型适应下游任务的首选方法之一。但是,当在同一域中看不见的类别测试时,当前的方法显着过度拟合了训练数据,患有大量准确性降解。为此,在本文中,我们做出以下4个贡献:(1)减轻基类过度适应,我们提出了一种新颖的语言意识到的软提示(LASP)学习方法(LASP)学习方法,即文本到文本跨性别损失损失,以最大程度地提示所学习的提示的可能性,以与预先定义的手工制作的文本提示正确地分类。 (2)为了增加提示的表示能力,我们提出了分组的LASP,其中每组提示都相对于单独的文本提示进行了优化。 (3)我们确定了通过及时学习和LASP引入的视觉语言错位,更重要的是,提出了一种重新校准机制来解决它。 (4)我们表明,LASP本质上可以包括在培训期间,即虚拟课程,即没有可用的视觉样本的类名,进一步提高了学习提示的鲁棒性。通过对11个数据集的评估,我们表明我们的方法(a)在软提示上的所有先前作品都显着优于所有先前的作品,并且(b)首次匹配和超过了通过手工制作的提示获得的新颖类的准确性,并在11个测试数据集中获得了8个。代码将在https://www.adrianbulat.com/lasp上提供
Soft prompt learning has recently emerged as one of the methods of choice for adapting V&L models to a downstream task using a few training examples. However, current methods significantly overfit the training data, suffering from large accuracy degradation when tested on unseen classes from the same domain. To this end, in this paper, we make the following 4 contributions: (1) To alleviate base class overfitting, we propose a novel Language-Aware Soft Prompting (LASP) learning method by means of a text-to-text cross-entropy loss that maximizes the probability of the learned prompts to be correctly classified with respect to pre-defined hand-crafted textual prompts. (2) To increase the representation capacity of the prompts, we propose grouped LASP where each group of prompts is optimized with respect to a separate subset of textual prompts. (3) We identify a visual-language misalignment introduced by prompt learning and LASP, and more importantly, propose a re-calibration mechanism to address it. (4) We show that LASP is inherently amenable to including, during training, virtual classes, i.e. class names for which no visual samples are available, further increasing the robustness of the learned prompts. Through evaluations on 11 datasets, we show that our approach (a) significantly outperforms all prior works on soft prompting, and (b) matches and surpasses, for the first time, the accuracy on novel classes obtained by hand-crafted prompts and CLIP for 8 out of 11 test datasets. Code will be made available at https://www.adrianbulat.com/lasp