论文标题
语音台面:文本条件的语音介绍
SpeechPainter: Text-conditioned Speech Inpainting
论文作者
论文摘要
我们提出了Speakpainter,这是一种通过利用辅助文本输入来填补语音样本中最多一秒钟的模型。我们证明该模型对适当的内容进行介绍,同时保持说话者的身份,韵律和记录环境条件,并推广到看不见的说话者。我们的方法显着优于使用自适应TT构建的基准,这是由人类评估者在并排偏好和MOS测试中判断的。
We propose SpeechPainter, a model for filling in gaps of up to one second in speech samples by leveraging an auxiliary textual input. We demonstrate that the model performs speech inpainting with the appropriate content, while maintaining speaker identity, prosody and recording environment conditions, and generalizing to unseen speakers. Our approach significantly outperforms baselines constructed using adaptive TTS, as judged by human raters in side-by-side preference and MOS tests.