论文标题
通过未标记的文本改进语音到语音翻译
Improving Speech-to-Speech Translation Through Unlabeled Text
论文作者
论文摘要
直接语音到语音翻译(S2ST)是由于S2ST的严重稀缺性,翻译范式中最具挑战性的问题之一。虽然已经努力通过级联预识别的语音识别(ASR),机器翻译(MT)和文本到语音(TTS)模型来增加未标记的语音的数据大小;未标记的文本仍然相对不足以改善S2ST。我们提出了一种利用不同语言的大量现有未标记的文本来创建大量S2ST数据以通过对生成的合成数据应用各种声学效应来提高S2ST性能的有效方法。从经验上讲,我们的方法在西班牙语 - 英语翻译中以多达2个bleu的速度优于艺术状态。在极低的资源环境中为西班牙语 - 英语和俄罗斯 - 英语翻译都证明了拟议方法的显着收益。
Direct speech-to-speech translation (S2ST) is among the most challenging problems in the translation paradigm due to the significant scarcity of S2ST data. While effort has been made to increase the data size from unlabeled speech by cascading pretrained speech recognition (ASR), machine translation (MT) and text-to-speech (TTS) models; unlabeled text has remained relatively under-utilized to improve S2ST. We propose an effective way to utilize the massive existing unlabeled text from different languages to create a large amount of S2ST data to improve S2ST performance by applying various acoustic effects to the generated synthetic data. Empirically our method outperforms the state of the art in Spanish-English translation by up to 2 BLEU. Significant gains by the proposed method are demonstrated in extremely low-resource settings for both Spanish-English and Russian-English translations.