论文标题
分析流端端到端的开发式语音识别器的质量和稳定性
Analyzing the Quality and Stability of a Streaming End-to-End On-Device Speech Recognizer
论文作者
论文摘要
随着自动语音识别(ASR)的应用增殖,对快速准确的逐步语音识别的需求增加。当用户仍在说话时,逐步识别器输出部分识别单词的块。可以在ASR最终确定其假设之前对部分结果进行修订,从而导致不稳定性问题。我们分析了设备流端到端(E2E)ASR模型的质量和稳定性。我们首先引入了一组新颖的指标,这些指标可以量化单词和段级别的不稳定性。我们研究了几种改善E2E模型质量但降低模型稳定性的模型训练技术的影响。我们将不稳定性的原因分类,并探索各种解决方案以在流e2e ASR系统中减轻它们。索引术语:ASR,稳定性,端到端,文本归一化,设备,RNN-T
The demand for fast and accurate incremental speech recognition increases as the applications of automatic speech recognition (ASR) proliferate. Incremental speech recognizers output chunks of partially recognized words while the user is still talking. Partial results can be revised before the ASR finalizes its hypothesis, causing instability issues. We analyze the quality and stability of on-device streaming end-to-end (E2E) ASR models. We first introduce a novel set of metrics that quantify the instability at word and segment levels. We study the impact of several model training techniques that improve E2E model qualities but degrade model stability. We categorize the causes of instability and explore various solutions to mitigate them in a streaming E2E ASR system. Index Terms: ASR, stability, end-to-end, text normalization,on-device, RNN-T