PERT：使用置换语言模型进行预训练BERT

论文标题

PERT：使用置换语言模型进行预训练BERT

PERT: Pre-training BERT with Permuted Language Model

论文作者

Cui, Yiming, Yang, Ziqing, Liu, Ting

论文摘要

预先训练的语言模型（PLM）已在各种自然语言处理（NLP）任务中广泛使用，这是由于其强大的文本表示在大规模语料库中训练。在本文中，我们提出了一个名为PERT自然语言理解（NLU）的新PLM。 PERT是一种自动编码模型（例如BERT），该模型接受了置换的语言模型（PERLM）。所提出的PERLM的配方很简单。我们计入了输入文本的一部分，训练目标是预测原始令牌的位置。此外，我们还应用了整个单词掩码和N-gram掩模来提高PERT的性能。我们对中文和英语NLU基准进行了广泛的实验。实验结果表明，PERT可以在某些任务上对各种可比基线进行改进，而其他任务则没有。这些结果表明，可以开发更多样化的预训练任务，而不是掩盖语言模型变体。进行了几项定量研究以更好地了解PERT，这可能在将来有助于设计PLM。资源可用：https：//github.com/ymcui/pert

Pre-trained Language Models (PLMs) have been widely used in various natural language processing (NLP) tasks, owing to their powerful text representations trained on large-scale corpora. In this paper, we propose a new PLM called PERT for natural language understanding (NLU). PERT is an auto-encoding model (like BERT) trained with Permuted Language Model (PerLM). The formulation of the proposed PerLM is straightforward. We permute a proportion of the input text, and the training objective is to predict the position of the original token. Moreover, we also apply whole word masking and N-gram masking to improve the performance of PERT. We carried out extensive experiments on both Chinese and English NLU benchmarks. The experimental results show that PERT can bring improvements over various comparable baselines on some of the tasks, while others are not. These results indicate that developing more diverse pre-training tasks is possible instead of masked language model variants. Several quantitative studies are carried out to better understand PERT, which might help design PLMs in the future. Resources are available: https://github.com/ymcui/PERT

下载PDF全文

下载文献需遵守相关版权规定

论文标题