简化-Then-translate：Black-Box机器翻译的自动预处理

论文标题

简化-Then-translate：Black-Box机器翻译的自动预处理

Simplify-then-Translate: Automatic Preprocessing for Black-Box Machine Translation

论文作者

Mehta, Sneha, Azarnoush, Bahareh, Chen, Boris, Saluja, Avneesh, Misra, Vinith, Bihani, Ballav, Kumar, Ritwik

论文摘要

事实证明，黑盒机器翻译系统对各种应用程序非常有用，但是设计很难适应，调整到特定的域或在其上构建。在这项工作中，我们介绍了一种通过使用句子简化的自动预处理（APP）来改进此类系统的方法。我们首先提出了一种方法，可以通过使用黑盒MT系统进行反翻译来自动生成大型内域释义语料库，该系统用于训练一个“简化”原始句子的释义模型，以更有利于翻译。该模型用于预处理多个低资源语言对的源句子。我们表明，与未经处理的源句子相比，这种预处理可带来更好的翻译性能。我们进一步进行并排评估，以验证简化句子的翻译比原始句子更好。最后，我们通过研究语言对的易于翻译（通过BLEU衡量）与从该语言对的后翻译（由Sari衡量）（按SARI衡量）的简化模型的质量之间的关系（按SARI衡量）（按SARI衡量）（按SARI衡量）进行简化的质量之间的关系提供了一些指导，以生成简化模型语料库，并将其与低回味转换的下游任务联系在一起。

Black-box machine translation systems have proven incredibly useful for a variety of applications yet by design are hard to adapt, tune to a specific domain, or build on top of. In this work, we introduce a method to improve such systems via automatic pre-processing (APP) using sentence simplification. We first propose a method to automatically generate a large in-domain paraphrase corpus through back-translation with a black-box MT system, which is used to train a paraphrase model that "simplifies" the original sentence to be more conducive for translation. The model is used to preprocess source sentences of multiple low-resource language pairs. We show that this preprocessing leads to better translation performance as compared to non-preprocessed source sentences. We further perform side-by-side human evaluation to verify that translations of the simplified sentences are better than the original ones. Finally, we provide some guidance on recommended language pairs for generating the simplification model corpora by investigating the relationship between ease of translation of a language pair (as measured by BLEU) and quality of the resulting simplification model from back-translations of this language pair (as measured by SARI), and tie this into the downstream task of low-resource translation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题