基于角色吸引注意力的端到端语音识别

论文标题

基于角色吸引注意力的端到端语音识别

Character-Aware Attention-Based End-to-End Speech Recognition

论文作者

Meng, Zhong, Gaur, Yashesh, Li, Jinyu, Gong, Yifan

论文摘要

在端到端语音识别中，预测单词和子词单元（WSU）作为输出对基于注意力的编码器（AED）模型有效。但是，作为解码器复发性神经网络（RNN）的一个输入，每个WSU嵌入都是通过上下文和声学信息以纯粹数据驱动的方式独立学习的。几乎没有努力明确地对WSU之间的形态关系进行建模。在这项工作中，我们提出了一种新颖的角色感知（CA）AED模型，其中通过使用CA-RNN汇总其成分字符的嵌入来计算每个WSU嵌入。该无关的CA-RNN与常规AED的编码器，解码器和注意网络共同训练，以预测WSUS。使用CA-AED，除了由传统AED建模的语义和声学关系外，形态上相似的WSU的嵌入自然和直接通过CA-RNN相关。此外，Ca-aed通过用较小的字符嵌入量替换大量的WSU嵌入来大大降低了传统AED中的模型参数。在3400小时的Microsoft Cortana数据集中，CA-AED在强大的AED基线中的相对改善可达到11.9％，模型参数少27.1％。

Predicting words and subword units (WSUs) as the output has shown to be effective for the attention-based encoder-decoder (AED) model in end-to-end speech recognition. However, as one input to the decoder recurrent neural network (RNN), each WSU embedding is learned independently through context and acoustic information in a purely data-driven fashion. Little effort has been made to explicitly model the morphological relationships among WSUs. In this work, we propose a novel character-aware (CA) AED model in which each WSU embedding is computed by summarizing the embeddings of its constituent characters using a CA-RNN. This WSU-independent CA-RNN is jointly trained with the encoder, the decoder and the attention network of a conventional AED to predict WSUs. With CA-AED, the embeddings of morphologically similar WSUs are naturally and directly correlated through the CA-RNN in addition to the semantic and acoustic relations modeled by a traditional AED. Moreover, CA-AED significantly reduces the model parameters in a traditional AED by replacing the large pool of WSU embeddings with a much smaller set of character embeddings. On a 3400 hours Microsoft Cortana dataset, CA-AED achieves up to 11.9% relative WER improvement over a strong AED baseline with 27.1% fewer model parameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题