关于直接语音翻译的目标细分

论文标题

关于直接语音翻译的目标细分

On Target Segmentation for Direct Speech Translation

论文作者

Di Gangi, Mattia Antonino, Gaido, Marco, Negri, Matteo, Turchi, Marco

论文摘要

关于直接语音翻译的最新研究表明，通过数据增强技术和更大的深度学习模型进行了持续的改进。尽管这些方法有助于缩小这种新方法与更传统的方法之间的差距，但不同的研究中存在许多不一致性，因此很难评估最新技术的状态。令人惊讶的是，讨论的一个点是目标文本的分割。最初提出了角色级分割以获得开放的词汇，但它会导致长序列和较长的训练时间。然后，子词级的细分在神经机器翻译中成为最先进的状态，因为它会产生较短的序列，以减少训练时间，同时比单词级模型优越。因此，尽管最初使用字符，并且在角色级别上提出了更好的结果，但最新的语音翻译工作开始使用目标子词。在这项工作中，我们对三种基准测试的两种方法进行了广泛的比较，其中包括8个语言方向和多语言培训。子词级分段在所有设置中都可以比较，以优于其字符级别的1到3个BLEU点。

Recent studies on direct speech translation show continuous improvements by means of data augmentation techniques and bigger deep learning models. While these methods are helping to close the gap between this new approach and the more traditional cascaded one, there are many incongruities among different studies that make it difficult to assess the state of the art. Surprisingly, one point of discussion is the segmentation of the target text. Character-level segmentation has been initially proposed to obtain an open vocabulary, but it results on long sequences and long training time. Then, subword-level segmentation became the state of the art in neural machine translation as it produces shorter sequences that reduce the training time, while being superior to word-level models. As such, recent works on speech translation started using target subwords despite the initial use of characters and some recent claims of better results at the character level. In this work, we perform an extensive comparison of the two methods on three benchmarks covering 8 language directions and multilingual training. Subword-level segmentation compares favorably in all settings, outperforming its character-level counterpart in a range of 1 to 3 BLEU points.

下载PDF全文

下载文献需遵守相关版权规定

论文标题