具有超诺班类预测的更好的语言模型

论文标题

具有超诺班类预测的更好的语言模型

Better Language Model with Hypernym Class Prediction

论文作者

Bai, He, Wang, Tong, Sordoni, Alessandro, Shi, Peng

论文摘要

基于班级的语言模型（LMS）长期以来一直设计为以$ n $ gram lms的形式解决上下文稀疏。在这项研究中，我们在神经LMS的背景下重新审视了这种方法。我们假设基于班级的预测会导致类似单词的隐式上下文聚集，从而可以改善稀有单词的概括。我们绘制具有常见的WordNet HyperNym到同一类的单词，并通过逐渐退火从预测班级到训练期间的标记预测来训练大型神经LM。从经验上讲，这种课程学习策略始终改善了两个数据集上的两个数据集（Wikitext-103和Arxiv）上基于最先进的变压器模型的困惑。我们的分析表明，在不牺牲稀有词的绩效的情况下，可以提高性能。最后，我们记录了未能产生经验收益的其他尝试，并讨论了在更大范围内采用基于类的LMS的未来方向。

Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs. In this study, we revisit this approach in the context of neural LMs. We hypothesize that class-based prediction leads to an implicit context aggregation for similar words and thus can improve generalization for rare words. We map words that have a common WordNet hypernym to the same class and train large neural LMs by gradually annealing from predicting the class to token prediction during training. Empirically, this curriculum learning strategy consistently improves perplexity over various large, highly-performant state-of-the-art Transformer-based models on two datasets, WikiText-103 and Arxiv. Our analysis shows that the performance improvement is achieved without sacrificing performance on rare words. Finally, we document other attempts that failed to yield empirical gains, and discuss future directions for the adoption of class-based LMs on a larger scale.

下载PDF全文

下载文献需遵守相关版权规定

论文标题