语言模型的位置掩盖

论文标题

语言模型的位置掩盖

Position Masking for Language Models

论文作者

Wagner, Andy, Mitra, Tiyasa, Iyer, Mrinal, Da Costa, Godfrey, Tremblay, Marc

论文摘要

掩盖语言建模（MLM）预训练模型（例如BERT）通过用[Mask]代替某些令牌，然后训练模型来重建原始令牌，从而破坏了输入。这是一种有效的技术，可以在所有NLP基准测试中取得良好的效果。我们建议通过掩盖某些令牌的位置以及蒙版输入令牌ID来扩展这个想法。我们遵循与BERT掩盖一定比例的令牌位置相同的标准方法，然后使用额外的完全连接的分类器阶段预测其原始值。这种方法显示了良好的性能提高（.3 \％改进），以促进收敛时间的额外改善。对于GraphCore IPU，BERT基座与位置掩蔽的收敛性仅需要原始Bert纸的50 \％。

Masked language modeling (MLM) pre-training models such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens. This is an effective technique which has led to good results on all NLP benchmarks. We propose to expand upon this idea by masking the positions of some tokens along with the masked input token ids. We follow the same standard approach as BERT masking a percentage of the tokens positions and then predicting their original values using an additional fully connected classifier stage. This approach has shown good performance gains (.3\% improvement) for the SQUAD additional improvement in convergence times. For the Graphcore IPU the convergence of BERT Base with position masking requires only 50\% of the tokens from the original BERT paper.

下载PDF全文

下载文献需遵守相关版权规定

论文标题