使语言模型的预处理和填充目标保持一致

论文标题

使语言模型的预处理和填充目标保持一致

Aligning the Pretraining and Finetuning Objectives of Language Models

论文作者

Pierse, Nuo Wang, Lu, Jingwen

论文摘要

我们证明，将预处理的目标与语言模型培训中的填充目标明确调整可显着提高填充任务绩效，并减少所需的填充示例的最低量。从客观对齐中获得的性能余量使我们能够为具有较少可用培训数据的任务构建具有较小尺寸的语言模型。我们通过将客观的对齐方式应用于利益标记和首字母缩写检测任务来提供这些主张的经验证据。 We found that, with objective alignment, our 768 by 3 and 512 by 3 transformer language models can reach accuracy of 83.9%/82.5% for concept-of-interest tagging and 73.8%/70.2% for acronym detection using only 200 finetuning examples per task, outperforming the 768 by 3 model pretrained without objective alignment by +4.8%/+3.4% and +9.9%/+6.3%.我们在数百个培训示例或更少的“少数示例学习”的情况下，我们将小语言模型命名为填补。在实践中，几乎没有通过客观对齐来实现的示例学习不仅可以节省人类的标签成本，而且还可以在更实时的应用程序中利用语言模型。

We demonstrate that explicitly aligning the pretraining objectives to the finetuning objectives in language model training significantly improves the finetuning task performance and reduces the minimum amount of finetuning examples required. The performance margin gained from objective alignment allows us to build language models with smaller sizes for tasks with less available training data. We provide empirical evidence of these claims by applying objective alignment to concept-of-interest tagging and acronym detection tasks. We found that, with objective alignment, our 768 by 3 and 512 by 3 transformer language models can reach accuracy of 83.9%/82.5% for concept-of-interest tagging and 73.8%/70.2% for acronym detection using only 200 finetuning examples per task, outperforming the 768 by 3 model pretrained without objective alignment by +4.8%/+3.4% and +9.9%/+6.3%. We name finetuning small language models in the presence of hundreds of training examples or less "Few Example learning". In practice, Few Example Learning enabled by objective alignment not only saves human labeling costs, but also makes it possible to leverage language models in more real-time applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题