基于混合CTC/注意系统的后验分布适应的端到端上下文ASR

论文标题

基于混合CTC/注意系统的后验分布适应的端到端上下文ASR

End-to-end contextual asr based on posterior distribution adaptation for hybrid ctc/attention system

论文作者

Zhang, Zhengyi, Zhou, Pan

论文摘要

端到端（E2E）语音识别体系结构将传统语音识别系统的所有组成部分组装成单个模型。尽管它简化了ASR系统，但它引入了上下文的ASR缺点：E2E模型在包含不频繁的适当名词的话语上的性能较差。在这项工作中，我们建议在基于注意的编码器解码器（AED）模型中添加上下文偏见注意力（CBA）模块，以提高其识别上下文短语的能力。具体而言，CBA利用解码器中源关注的上下文向量来处理特定的偏置嵌入。 CBA共同学习了基本AED参数，可以告诉模型何时何地偏向其输出概率分布。在推论阶段，根据CBA的偏置短语，我们对CTC和注意解码器的后验分布进行了预加载。我们在GigAspeech上评估了所提出的方法，与基线模型相比，偏差短语的回忆率从15％到28％不一致。同时，我们的方法表现出强大的反偏置能力，因为一般测试的性能仅降低1.7％，甚至存在2,000个偏差短语。

End-to-end (E2E) speech recognition architectures assemble all components of traditional speech recognition system into a single model. Although it simplifies ASR system, it introduces contextual ASR drawback: the E2E model has worse performance on utterances containing infrequent proper nouns. In this work, we propose to add a contextual bias attention (CBA) module to attention based encoder decoder (AED) model to improve its ability of recognizing the contextual phrases. Specifically, CBA utilizes the context vector of source attention in decoder to attend to a specific bias embedding. Jointly learned with the basic AED parameters, CBA can tell the model when and where to bias its output probability distribution. At inference stage, a list of bias phrases is preloaded and we adapt the posterior distributions of both CTC and attention decoder according to the attended bias phrase of CBA. We evaluate the proposed method on GigaSpeech and achieve a consistent relative improvement on recall rate of bias phrases ranging from 15% to 28% compared to the baseline model. Meanwhile, our method shows a strong anti-bias ability as the performance on general tests only degrades 1.7% even 2,000 bias phrases are present.

下载PDF全文

下载文献需遵守相关版权规定

论文标题