论文标题
除了以英语为中心的bitexts,用于更好的多语言表示学习
Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning
论文作者
论文摘要
在本文中,我们详细阐述了用于建立多语言表示模型的食谱,这些食谱不仅与现有的最新模型竞争,而且更有效地参数,从而在资源受限的场景和实用应用中促进了更好的采用。我们表明,超越了以英语为中心的bitexts,再加上一种旨在减少培训数据利用不足的新型抽样策略,从而大大提高了Electra和MLM训练前目标的模型尺寸的性能。我们介绍了xy-lent:x-y bitext增强语言编码,使用变压器不仅可以在所有模型尺寸频段内实现5个跨语性任务的最先进的性能,而且在各个乐队之间也具有竞争力。我们的XY-Lent XL变体的表现优于XLM-RXXL,并以MT5 XXL表现出竞争性能,而分别较小5倍和6倍。然后,我们证明我们提出的方法有助于改善多语言的诅咒,而XY-LENT XL达到99.3%的胶水性能和98.5%的Squad 2.0性能,与同一尺寸频段中的仅SOTA英语模型相比。然后,我们在极低的资源语言上分析了模型性能,并认为在这种情况下,仅缩放可能就不足以提高性能
In this paper, we elaborate upon recipes for building multilingual representation models that are not only competitive with existing state-of-the-art models but are also more parameter efficient, thereby promoting better adoption in resource-constrained scenarios and practical applications. We show that going beyond English-centric bitexts, coupled with a novel sampling strategy aimed at reducing under-utilization of training data, substantially boosts performance across model sizes for both Electra and MLM pre-training objectives. We introduce XY-LENT: X-Y bitext enhanced Language ENcodings using Transformers which not only achieves state-of-the-art performance over 5 cross-lingual tasks within all model size bands, is also competitive across bands. Our XY-LENT XL variant outperforms XLM-RXXL and exhibits competitive performance with mT5 XXL while being 5x and 6x smaller respectively. We then show that our proposed method helps ameliorate the curse of multilinguality, with the XY-LENT XL achieving 99.3% GLUE performance and 98.5% SQuAD 2.0 performance compared to a SoTA English only model in the same size band. We then analyze our models performance on extremely low resource languages and posit that scaling alone may not be sufficient for improving the performance in this scenario