学习与正常流的关节关节声音表示

论文标题

学习与正常流的关节关节声音表示

Learning Joint Articulatory-Acoustic Representations with Normalizing Flows

论文作者

Saha, Pramit, Fels, Sidney

论文摘要

声道的发音几何形态和结果语音的声学特性被认为具有牢固的因果关系。本文旨在通过可演变的神经网络模型找到元音声音的关节和声学结构域之间的联合表示，同时保留了相应的特定域特异性特征。我们的模型利用卷积自动编码器体系结构和基于流量的模型正常化的模型，以半监督的方式允许前向和反向映射，在两种自由度的宣传合成器中的距离中间声带几何形状之间具有1D声波模型，并具有1D声波模型和旋光性的代表性。我们的方法在实现发音到声学和声学映射方面取得了令人满意的表现，从而证明了我们在实现两个领域的联合编码方面的成功。

The articulatory geometric configurations of the vocal tract and the acoustic properties of the resultant speech sound are considered to have a strong causal relationship. This paper aims at finding a joint latent representation between the articulatory and acoustic domain for vowel sounds via invertible neural network models, while simultaneously preserving the respective domain-specific features. Our model utilizes a convolutional autoencoder architecture and normalizing flow-based models to allow both forward and inverse mappings in a semi-supervised manner, between the mid-sagittal vocal tract geometry of a two degrees-of-freedom articulatory synthesizer with 1D acoustic wave model and the Mel-spectrogram representation of the synthesized speech sounds. Our approach achieves satisfactory performance in achieving both articulatory-to-acoustic as well as acoustic-to-articulatory mapping, thereby demonstrating our success in achieving a joint encoding of both the domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题