Syntaspeech：语法感知的生成对抗文本到语音

论文标题

Syntaspeech：语法感知的生成对抗文本到语音

SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech

论文作者

Ye, Zhenhui, Zhao, Zhou, Ren, Yi, Wu, Fei

论文摘要

非自动回归文本到语音（NAR-TTS）的最新进展使快速，高质量的语音合成成为可能。但是，当前的NAR-TTS模型通常使用音素序列作为输入，因此无法理解输入序列的树结构句法信息，这会损害韵律建模。为此，我们提出了SyntaSpeech，这是一种语法感知和轻巧的NAR-TTS模型，该模型将树结构化的句法信息集成到Portaspeech \ cite {Ren2021portepeech}中的韵律建模模块中。具体来说，1）我们基于输入句子的依赖关系构建句法图，然后使用句法图编码来处理文本编码以提取句法信息。 2）我们将提取的句法编码与portaspeech结合在一起，以改善韵律预测。 3）我们介绍了一个多长度歧视器，以替换PortAspeech中基于流的后网络，从而简化了训练管道并提高了推理速度，同时保持了生成的音频的自然性。三个数据集上的实验不仅表明，树结构化的句法信息授予语法通过表达韵律综合更好的音频，而且还展示了语法适应多种语言和多言论词的概括性文本对语的通用能力。消融研究表明，在含糊中，每个组件的必要性。源代码和音频样本可在https://syntaspeech.github.io上找到

The recent progress in non-autoregressive text-to-speech (NAR-TTS) has made fast and high-quality speech synthesis possible. However, current NAR-TTS models usually use phoneme sequence as input and thus cannot understand the tree-structured syntactic information of the input sequence, which hurts the prosody modeling. To this end, we propose SyntaSpeech, a syntax-aware and light-weight NAR-TTS model, which integrates tree-structured syntactic information into the prosody modeling modules in PortaSpeech \cite{ren2021portaspeech}. Specifically, 1) We build a syntactic graph based on the dependency tree of the input sentence, then process the text encoding with a syntactic graph encoder to extract the syntactic information. 2) We incorporate the extracted syntactic encoding with PortaSpeech to improve the prosody prediction. 3) We introduce a multi-length discriminator to replace the flow-based post-net in PortaSpeech, which simplifies the training pipeline and improves the inference speed, while keeping the naturalness of the generated audio. Experiments on three datasets not only show that the tree-structured syntactic information grants SyntaSpeech the ability to synthesize better audio with expressive prosody, but also demonstrate the generalization ability of SyntaSpeech to adapt to multiple languages and multi-speaker text-to-speech. Ablation studies demonstrate the necessity of each component in SyntaSpeech. Source code and audio samples are available at https://syntaspeech.github.io

下载PDF全文

下载文献需遵守相关版权规定

论文标题