论文标题
在模块化框架中流式语音识别的全球归一化
Global Normalization for Streaming Speech Recognition in a Modular Framework
论文作者
论文摘要
我们介绍了全球标准化的自动回归传感器(GNAT),以解决流式语音识别中的标签偏差问题。我们的解决方案接受了对序列级别归一化的分母的可拖动精确计算。通过理论和经验结果,我们证明,通过切换到全球归一化模型,可以大大降低流媒体和非流语音识别模型之间的错误率差距(在LibrisPeech数据集中超过50 \%)。该模型是在一个模块化框架中开发的,该框架涵盖了所有常见的神经语音识别模型。该框架的模块化可以控制建模选择和新模型的创建。
We introduce the Globally Normalized Autoregressive Transducer (GNAT) for addressing the label bias problem in streaming speech recognition. Our solution admits a tractable exact computation of the denominator for the sequence-level normalization. Through theoretical and empirical results, we demonstrate that by switching to a globally normalized model, the word error rate gap between streaming and non-streaming speech-recognition models can be greatly reduced (by more than 50\% on the Librispeech dataset). This model is developed in a modular framework which encompasses all the common neural speech recognition models. The modularity of this framework enables controlled comparison of modelling choices and creation of new models.