论文标题
使用变压器的长形式的标点符号流式标点符号
Streaming Punctuation for Long-form Dictation with Transformers
论文作者
论文摘要
虽然语音识别单词错误率(WER)已达到英语的人类均等,但长期命令情景仍然遭受分割和标点符号问题,这是由于不规则的暂停模式或缓慢的扬声器所致。变压器序列标记模型可有效捕获长的双向环境,这对于自动标点符号至关重要。但是,自动语音识别(ASR)生产系统受到实时要求的约束,因此在做出标点符号决策时很难合并正确的环境。在本文中,我们提出了一种使用动态解码窗口对ASR输出的标点符号或重新函数的流媒体方法,并测量其对标点符号和分段精度的影响。新系统解决了过度细分问题,将细分F0.5得分提高了13.9%。对于机器翻译(MT)的下游任务,流式标点符号的平均BLEU得分提高了0.66。
While speech recognition Word Error Rate (WER) has reached human parity for English, long-form dictation scenarios still suffer from segmentation and punctuation problems resulting from irregular pausing patterns or slow speakers. Transformer sequence tagging models are effective at capturing long bi-directional context, which is crucial for automatic punctuation. Automatic Speech Recognition (ASR) production systems, however, are constrained by real-time requirements, making it hard to incorporate the right context when making punctuation decisions. In this paper, we propose a streaming approach for punctuation or re-punctuation of ASR output using dynamic decoding windows and measure its impact on punctuation and segmentation accuracy across scenarios. The new system tackles over-segmentation issues, improving segmentation F0.5-score by 13.9%. Streaming punctuation achieves an average BLEU-score improvement of 0.66 for the downstream task of Machine Translation (MT).