猜猜指令！翻转学习使语言模型更强

论文标题

猜猜指令！翻转学习使语言模型更强

Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners

论文作者

Ye, Seonghyeon, Kim, Doyoung, Jang, Joel, Shin, Joongbo, Seo, Minjoon

论文摘要

通过在任务指令和输入实例的情况下最大化目标标签的可能性，可以在各种下游任务上微调语言模型（LM），从而改善了零摄像的任务概括性能，从而微调了语言模型（LM）。但是，元训练的LMS仍在努力概括到元训练期间看不见的新型标签的具有挑战性的任务。在本文中，我们提出了翻转学习，这是一种替代的元训练方法，它训练LM以给定输入实例和标签生成任务指令。在推断期间，经过翻转学习的LM被称为翻转，选择了最有可能生成任务指令的标签选项。在大基础基准测试的14个任务中，11B尺寸的翻转效果均优于零射击T0-11B，甚至平均3次GPT-3（175B）的16倍，分别为8.4％和9.7％。翻转具有看不见的标签的任务的大幅度改进，表现优于T0-11B的平均F1得分 +20％。这表明翻转的强大任务概括来自改善对新标签的概括。我们在https://github.com/seonghyeeyey/flipped-learning上发布代码。

Meta-training, which fine-tunes the language model (LM) on various downstream tasks by maximizing the likelihood of the target label given the task instruction and input instance, has improved the zero-shot task generalization performance. However, meta-trained LMs still struggle to generalize to challenging tasks containing novel labels unseen during meta-training. In this paper, we propose Flipped Learning, an alternative method of meta-training which trains the LM to generate the task instruction given the input instance and label. During inference, the LM trained with Flipped Learning, referred to as Flipped, selects the label option that is most likely to generate the task instruction. On 14 tasks of the BIG-bench benchmark, the 11B-sized Flipped outperforms zero-shot T0-11B and even a 16 times larger 3-shot GPT-3 (175B) on average by 8.4% and 9.7% points, respectively. Flipped gives particularly large improvements on tasks with unseen labels, outperforming T0-11B by up to +20% average F1 score. This indicates that the strong task generalization of Flipped comes from improved generalization to novel labels. We release our code at https://github.com/seonghyeonye/Flipped-Learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题