熔岩纳特：一种非自动回归翻译模型，具有外观解码和词汇的关注

论文标题

熔岩纳特：一种非自动回归翻译模型，具有外观解码和词汇的关注

LAVA NAT: A Non-Autoregressive Translation Model with Look-Around Decoding and Vocabulary Attention

论文作者

Li, Xiaoya, Meng, Yuxian, Yuan, Arianna, Wu, Fei, Li, Jiwei

论文摘要

非自动回旋翻译（NAT）模型在一个正向上产生多个令牌，并且与自回旋翻译（AT）方法相比，在推理阶段非常有效。但是，NAT模型通常会遇到多模式问题，即产生重复的令牌或缺失令牌。在本文中，我们提出了两种解决这个问题的新方法，即围绕（LA）策略和词汇注意力（VA）机制。外观策略可以预测邻居令牌，以预测当前的令牌，并且通过参加每个位置的整个词汇来获取即将产生的令牌的知识，而词汇注意模型长期令牌依赖的依赖性。％我们还提出了一种动态双向解码方法，以加速熔岩模型的推理过程，同时保留生成的输出的高质量。与自回归模型和大多数其他NAT模型相比，我们提出的模型在推理过程中使用的时间大大减少。 Our experiments on four benchmarks (WMT14 En$\rightarrow$De, WMT14 De$\rightarrow$En, WMT16 Ro$\rightarrow$En and IWSLT14 De$\rightarrow$En) show that the proposed model achieves competitive performance compared with the state-of-the-art non-autoregressive and autoregressive models while significantly reducing the time cost in inference 阶段。

Non-autoregressive translation (NAT) models generate multiple tokens in one forward pass and is highly efficient at inference stage compared with autoregressive translation (AT) methods. However, NAT models often suffer from the multimodality problem, i.e., generating duplicated tokens or missing tokens. In this paper, we propose two novel methods to address this issue, the Look-Around (LA) strategy and the Vocabulary Attention (VA) mechanism. The Look-Around strategy predicts the neighbor tokens in order to predict the current token, and the Vocabulary Attention models long-term token dependencies inside the decoder by attending the whole vocabulary for each position to acquire knowledge of which token is about to generate. %We also propose a dynamic bidirectional decoding approach to accelerate the inference process of the LAVA model while preserving the high-quality of the generated output. Our proposed model uses significantly less time during inference compared with autoregressive models and most other NAT models. Our experiments on four benchmarks (WMT14 En$\rightarrow$De, WMT14 De$\rightarrow$En, WMT16 Ro$\rightarrow$En and IWSLT14 De$\rightarrow$En) show that the proposed model achieves competitive performance compared with the state-of-the-art non-autoregressive and autoregressive models while significantly reducing the time cost in inference phase.

下载PDF全文

下载文献需遵守相关版权规定

论文标题