论文标题
自适应训练的流利和低延迟同时语音到语音翻译
Fluent and Low-latency Simultaneous Speech-to-Speech Translation with Self-adaptive Training
论文作者
论文摘要
同时的语音到语音翻译非常有用,但极具挑战性,因为它需要与源语言语音同时生成目标语言语音,只有几秒钟的延迟。此外,它需要不断地翻译一系列句子,但是所有最近的解决方案都只关注单词场景。结果,当说话者说话更快时,当前的方法会逐渐积累潜伏期,并在说话者说话较慢时引入不自然的暂停。为了克服这些问题,我们提出了自适应翻译(SAT),该翻译可以灵活地调整翻译长度以适应不同的源语音速率。在相似的翻译质量水平(通过BLEU测量)下,我们的方法在两个方向上都会产生更流利的目标语音(由自然度公关MOS衡量),其潜伏期大大低于基线。
Simultaneous speech-to-speech translation is widely useful but extremely challenging, since it needs to generate target-language speech concurrently with the source-language speech, with only a few seconds delay. In addition, it needs to continuously translate a stream of sentences, but all recent solutions merely focus on the single-sentence scenario. As a result, current approaches accumulate latencies progressively when the speaker talks faster, and introduce unnatural pauses when the speaker talks slower. To overcome these issues, we propose Self-Adaptive Translation (SAT) which flexibly adjusts the length of translations to accommodate different source speech rates. At similar levels of translation quality (as measured by BLEU), our method generates more fluent target speech (as measured by the naturalness metric MOS) with substantially lower latency than the baseline, in both Zh <-> En directions.