说话者独立和多语言/语言语音驱动的说话头产生使用语音后部

论文标题

说话者独立和多语言/语言语音驱动的说话头产生使用语音后部

Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams

论文作者

Huang, Huirong, Wu, Zhiyong, Kang, Shiyin, Dai, Dongyang, Jia, Jia, Fu, Tianxiao, Tuo, Deyi, Lei, Guangzhi, Liu, Peng, Su, Dan, Yu, Dong, Meng, Helen

论文摘要

近年来，产生3D语音驱动的说话的人受到了越来越多的关注。最近的方法主要具有以下局限性：1）大多数与说话者无关的方法都需要手工制作的功能，这些功能耗时设计或不可靠； 2）没有令人信服的方法来支持多语言或混音语音作为输入。在这项工作中，我们提出了一种使用语音后验（PPG）的新方法。通过这种方式，我们的方法不需要手工制作的功能，与最近的方法相比，噪音更强大。此外，我们的方法可以通过构建通用音素空间来支持多语言语音作为输入。据我们所知，我们的模型是第一个支持多语言/混音语音作为令人信服的结果的输入的模型。客观和主观的实验表明，我们的模型可以产生来自看不见的语言或扬声器的语音的高质量动画，并对噪音保持稳健。

Generating 3D speech-driven talking head has received more and more attention in recent years. Recent approaches mainly have following limitations: 1) most speaker-independent methods need handcrafted features that are time-consuming to design or unreliable; 2) there is no convincing method to support multilingual or mixlingual speech as input. In this work, we propose a novel approach using phonetic posteriorgrams (PPG). In this way, our method doesn't need hand-crafted features and is more robust to noise compared to recent approaches. Furthermore, our method can support multilingual speech as input by building a universal phoneme space. As far as we know, our model is the first to support multilingual/mixlingual speech as input with convincing results. Objective and subjective experiments have shown that our model can generate high quality animations given speech from unseen languages or speakers and be robust to noise.

下载PDF全文

下载文献需遵守相关版权规定

论文标题