UWSpeech：对不成文语言的语音翻译语音翻译

论文标题

UWSpeech：对不成文语言的语音翻译语音翻译

UWSpeech: Speech to Speech Translation for Unwritten Languages

论文作者

Zhang, Chen, Tan, Xu, Ren, Yi, Qin, Tao, Zhang, Kejun, Liu, Tie-Yan

论文摘要

现有的语音翻译系统的语音在很大程度上依赖于目标语言的文本：它们通常将源语言转换为目标文本，然后从文本中综合目标语音，或直接用目标文本进行辅助培训的目标语音。但是，这些方法不能应用于没有书面文本或音素的未成文目标语言。在本文中，我们开发了一个名为UWspeech的不成文语言的翻译系统，该语言将目标不成文的语音转换为使用转换器的离散令牌，然后将源语言语音转换为Target Inking Invece nove target InveTe nove translator，最后将目标语音从目标离散令牌中与逆变器合成目标语音。我们提出了一种称为XL-VAE的方法，该方法通过跨语性（XL）语音识别来增强矢量量化变异自动编码器（VQ-VAE），以训练UWSpeech联合的转换器和逆变器。 Fisher西班牙语对话翻译数据集的实验表明，UWSpeech的表现分别超过了直接翻译和VQ-VAE基线，分别大约是16和10个BLEU点，这表明了UWSpeech的优势和潜力。

Existing speech to speech translation systems heavily rely on the text of target language: they usually translate source language either to target text and then synthesize target speech from text, or directly to target speech with target text for auxiliary training. However, those methods cannot be applied to unwritten target languages, which have no written text or phoneme available. In this paper, we develop a translation system for unwritten languages, named as UWSpeech, which converts target unwritten speech into discrete tokens with a converter, and then translates source-language speech into target discrete tokens with a translator, and finally synthesizes target speech from target discrete tokens with an inverter. We propose a method called XL-VAE, which enhances vector quantized variational autoencoder (VQ-VAE) with cross-lingual (XL) speech recognition, to train the converter and inverter of UWSpeech jointly. Experiments on Fisher Spanish-English conversation translation dataset show that UWSpeech outperforms direct translation and VQ-VAE baseline by about 16 and 10 BLEU points respectively, which demonstrate the advantages and potentials of UWSpeech.

下载PDF全文

下载文献需遵守相关版权规定

论文标题