简单的复发改善蒙版语言模型

论文标题

简单的复发改善蒙版语言模型

Simple Recurrence Improves Masked Language Models

论文作者

Lei, Tao, Tian, Ran, Bastings, Jasmijn, Parikh, Ankur P.

论文摘要

在这项工作中，我们通过在变压器中构建一个极其简单的经常性模块来探讨是否将重复发生在变压器体系结构中既有益又有效。在BERT的培训和评估配方之后，我们将我们的模型与基准进行了比较。我们的结果证实，复发确实可以通过一致的边距来改善变压器模型，而无需低水平的性能优化，而在保持参数数量恒定的同时。例如，我们的基本模型实现了在10个任务中平均2.1分的绝对改善，并且还表明，在一系列学习率上，微调的稳定性提高。

In this work, we explore whether modeling recurrence into the Transformer architecture can both be beneficial and efficient, by building an extremely simple recurrent module into the Transformer. We compare our model to baselines following the training and evaluation recipe of BERT. Our results confirm that recurrence can indeed improve Transformer models by a consistent margin, without requiring low-level performance optimizations, and while keeping the number of parameters constant. For example, our base model achieves an absolute improvement of 2.1 points averaged across 10 tasks and also demonstrates increased stability in fine-tuning over a range of learning rates.

下载PDF全文

下载文献需遵守相关版权规定

论文标题