论文标题
0-Shot多语言语音综合的语音特征
Phonological Features for 0-shot Multilingual Speech Synthesis
论文作者
论文摘要
代码转换---多种语言的内部使用使用 - - 全世界普遍存在。在文本到语音(TTS)中,已经发现多语言模型可以启用代码转换。通过将语言输入修改为序列到序列tts,我们表明,即使在单语模型中,在培训期间也可以看不见的语言进行代码转换。我们使用源自国际语音字母(IPA)的一小部分语音特征,例如元音高度和前部,辅音位置和方式。这使模型拓扑可以保持不同语言的不变,并可以通过模型来解释新的,以前看不见的功能组合。我们表明,这使我们能够在测试时以新语言生成可理解的,代码切换的语音,包括在培训中从未见过的声音近似。
Code-switching---the intra-utterance use of multiple languages---is prevalent across the world. Within text-to-speech (TTS), multilingual models have been found to enable code-switching. By modifying the linguistic input to sequence-to-sequence TTS, we show that code-switching is possible for languages unseen during training, even within monolingual models. We use a small set of phonological features derived from the International Phonetic Alphabet (IPA), such as vowel height and frontness, consonant place and manner. This allows the model topology to stay unchanged for different languages, and enables new, previously unseen feature combinations to be interpreted by the model. We show that this allows us to generate intelligible, code-switched speech in a new language at test time, including the approximation of sounds never seen in training.