论文标题
机器翻译鲁棒性对自然气候变化
Machine Translation Robustness to Natural Asemantic Variation
论文作者
论文摘要
当前的机器翻译(MT)模型仍然在更具挑战性的输入方面挣扎,例如嘈杂的数据和尾端单词和短语。几项工作通过识别特定的噪声和变化类别,然后调整模型以更好地对其进行调整,从而解决了这一鲁棒性问题。一个重要但研究不足的类别涉及差异(非类型)的微小变化,这些变化保留了含义W.R.T.目标语言。我们将此类别介绍并形式化为自然的气象变异(NAV),并在MT稳健性的背景下进行调查。我们发现,现有的MT模型在呈现NAV数据时失败,但是我们通过通过人类生成的变体对NAV进行微调来展示提高NAV的性能的策略。我们还表明,NAV鲁棒性可以在语言上传递,并发现合成扰动可以实现有机NAV数据的某些好处,但不是全部的好处。
Current Machine Translation (MT) models still struggle with more challenging input, such as noisy data and tail-end words and phrases. Several works have addressed this robustness issue by identifying specific categories of noise and variation then tuning models to perform better on them. An important yet under-studied category involves minor variations in nuance (non-typos) that preserve meaning w.r.t. the target language. We introduce and formalize this category as Natural Asemantic Variation (NAV) and investigate it in the context of MT robustness. We find that existing MT models fail when presented with NAV data, but we demonstrate strategies to improve performance on NAV by fine-tuning them with human-generated variations. We also show that NAV robustness can be transferred across languages and find that synthetic perturbations can achieve some but not all of the benefits of organic NAV data.