论文标题
模拟的多个参考培训改善了低资源机器翻译
Simulated Multiple Reference Training Improves Low-Resource Machine Translation
论文作者
论文摘要
给定的句子存在许多有效的翻译,但是机器翻译(MT)经过单个参考转换,在低资源设置中加剧数据稀疏性训练。我们介绍了模拟的多个参考训练(SMRT),这是一种新型的MT训练方法,通过从释义者中抽样参考句子的解释,并训练MT模型,以预测释义者的分布,以预测可能的令牌。我们证明了SMRT在翻译为英语时在低资源环境中的有效性,改善了1.2至7.0 bleu。我们还发现SMRT与反向翻译是互补的。
Many valid translations exist for a given sentence, yet machine translation (MT) is trained with a single reference translation, exacerbating data sparsity in low-resource settings. We introduce Simulated Multiple Reference Training (SMRT), a novel MT training method that approximates the full space of possible translations by sampling a paraphrase of the reference sentence from a paraphraser and training the MT model to predict the paraphraser's distribution over possible tokens. We demonstrate the effectiveness of SMRT in low-resource settings when translating to English, with improvements of 1.2 to 7.0 BLEU. We also find SMRT is complementary to back-translation.