在模块化框架中流式语音识别的全球归一化

论文标题

在模块化框架中流式语音识别的全球归一化

Global Normalization for Streaming Speech Recognition in a Modular Framework

论文作者

Variani, Ehsan, Wu, Ke, Riley, Michael, Rybach, David, Shannon, Matt, Allauzen, Cyril

论文摘要

我们介绍了全球标准化的自动回归传感器（GNAT），以解决流式语音识别中的标签偏差问题。我们的解决方案接受了对序列级别归一化的分母的可拖动精确计算。通过理论和经验结果，我们证明，通过切换到全球归一化模型，可以大大降低流媒体和非流语音识别模型之间的错误率差距（在LibrisPeech数据集中超过50 \％）。该模型是在一个模块化框架中开发的，该框架涵盖了所有常见的神经语音识别模型。该框架的模块化可以控制建模选择和新模型的创建。

We introduce the Globally Normalized Autoregressive Transducer (GNAT) for addressing the label bias problem in streaming speech recognition. Our solution admits a tractable exact computation of the denominator for the sequence-level normalization. Through theoretical and empirical results, we demonstrate that by switching to a globally normalized model, the word error rate gap between streaming and non-streaming speech-recognition models can be greatly reduced (by more than 50\% on the Librispeech dataset). This model is developed in a modular framework which encompasses all the common neural speech recognition models. The modularity of this framework enables controlled comparison of modelling choices and creation of new models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题