语音分割优化使用分段的双语语音语料库进行端到端语音翻译

论文标题

语音分割优化使用分段的双语语音语料库进行端到端语音翻译

Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation

论文作者

Fukuda, Ryo, Sudoh, Katsuhito, Nakamura, Satoshi

论文摘要

语音细分将长言语分为短段，对于语音翻译（ST）至关重要。 WebRTC VAD等流行的VAD工具通常依赖于基于暂停的细分。不幸的是，语音中的停顿不一定与句子边界匹配，并且可以通过很短的停顿来连接句子，而VAD很难检测到。在这项研究中，我们建议使用使用分割的双语语音语料库训练的二元分类模型进行语音分割方法。我们还提出了一种结合VAD和上述语音分割方法的混合方法。实验结果表明，所提出的方法比常规分割方法更适合级联和端到端ST系统。混合方法进一步改善了翻译性能。

Speech segmentation, which splits long speech into short segments, is essential for speech translation (ST). Popular VAD tools like WebRTC VAD have generally relied on pause-based segmentation. Unfortunately, pauses in speech do not necessarily match sentence boundaries, and sentences can be connected by a very short pause that is difficult to detect by VAD. In this study, we propose a speech segmentation method using a binary classification model trained using a segmented bilingual speech corpus. We also propose a hybrid method that combines VAD and the above speech segmentation method. Experimental results revealed that the proposed method is more suitable for cascade and end-to-end ST systems than conventional segmentation methods. The hybrid approach further improved the translation performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题