多任务学习的声学演讲反演

论文标题

多任务学习的声学演讲反演

Acoustic-to-articulatory Speech Inversion with Multi-task Learning

论文作者

Siriwardena, Yashish M., Sivaraman, Ganesh, Espy-Wilson, Carol

论文摘要

事实证明，多任务学习（MTL）框架在与语音相关的任务中有效，例如自动语音识别（ASR）和语音情感识别。本文提出了一个MTL框架，通过同时将声学映射作为共享任务来执行声学演讲反演。我们使用Haskins的生产率比较（HPRC）数据库，该数据库既具有电磁功能学（EMA）数据和相应的语音转录。通过计算从声学到关节语音倒置任务的估计和实际路变量（TV）之间的相关性来衡量系统的性能。拟议的基于MTL的双向门控复发性神经网络（RNN）模型学会了将输入声特征映射到九个电视上，同时表现优于仅训练只能执行声学到关节倒置的基线模型。

Multi-task learning (MTL) frameworks have proven to be effective in diverse speech related tasks like automatic speech recognition (ASR) and speech emotion recognition. This paper proposes a MTL framework to perform acoustic-to-articulatory speech inversion by simultaneously learning an acoustic to phoneme mapping as a shared task. We use the Haskins Production Rate Comparison (HPRC) database which has both the electromagnetic articulography (EMA) data and the corresponding phonetic transcriptions. Performance of the system was measured by computing the correlation between estimated and actual tract variables (TVs) from the acoustic to articulatory speech inversion task. The proposed MTL based Bidirectional Gated Recurrent Neural Network (RNN) model learns to map the input acoustic features to nine TVs while outperforming the baseline model trained to perform only acoustic to articulatory inversion.

下载PDF全文

下载文献需遵守相关版权规定

论文标题