论文标题

在模块化框架中流式语音识别的全球归一化

Global Normalization for Streaming Speech Recognition in a Modular Framework

论文作者

Variani, Ehsan, Wu, Ke, Riley, Michael, Rybach, David, Shannon, Matt, Allauzen, Cyril

论文摘要

我们介绍了全球标准化的自动回归传感器(GNAT),以解决流式语音识别中的标签偏差问题。我们的解决方案接受了对序列级别归一化的分母的可拖动精确计算。通过理论和经验结果,我们证明,通过切换到全球归一化模型,可以大大降低流媒体和非流语音识别模型之间的错误率差距(在LibrisPeech数据集中超过50 \%)。该模型是在一个模块化框架中开发的,该框架涵盖了所有常见的神经语音识别模型。该框架的模块化可以控制建模选择和新模型的创建。

We introduce the Globally Normalized Autoregressive Transducer (GNAT) for addressing the label bias problem in streaming speech recognition. Our solution admits a tractable exact computation of the denominator for the sequence-level normalization. Through theoretical and empirical results, we demonstrate that by switching to a globally normalized model, the word error rate gap between streaming and non-streaming speech-recognition models can be greatly reduced (by more than 50\% on the Librispeech dataset). This model is developed in a modular framework which encompasses all the common neural speech recognition models. The modularity of this framework enables controlled comparison of modelling choices and creation of new models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源