使用变压器的长形式的标点符号流式标点符号

论文标题

使用变压器的长形式的标点符号流式标点符号

Streaming Punctuation for Long-form Dictation with Transformers

论文作者

Behre, Piyush, Tan, Sharman, Varadharajan, Padma, Chang, Shuangyu

论文摘要

虽然语音识别单词错误率（WER）已达到英语的人类均等，但长期命令情景仍然遭受分割和标点符号问题，这是由于不规则的暂停模式或缓慢的扬声器所致。变压器序列标记模型可有效捕获长的双向环境，这对于自动标点符号至关重要。但是，自动语音识别（ASR）生产系统受到实时要求的约束，因此在做出标点符号决策时很难合并正确的环境。在本文中，我们提出了一种使用动态解码窗口对ASR输出的标点符号或重新函数的流媒体方法，并测量其对标点符号和分段精度的影响。新系统解决了过度细分问题，将细分F0.5得分提高了13.9％。对于机器翻译（MT）的下游任务，流式标点符号的平均BLEU得分提高了0.66。

While speech recognition Word Error Rate (WER) has reached human parity for English, long-form dictation scenarios still suffer from segmentation and punctuation problems resulting from irregular pausing patterns or slow speakers. Transformer sequence tagging models are effective at capturing long bi-directional context, which is crucial for automatic punctuation. Automatic Speech Recognition (ASR) production systems, however, are constrained by real-time requirements, making it hard to incorporate the right context when making punctuation decisions. In this paper, we propose a streaming approach for punctuation or re-punctuation of ASR output using dynamic decoding windows and measure its impact on punctuation and segmentation accuracy across scenarios. The new system tackles over-segmentation issues, improving segmentation F0.5-score by 13.9%. Streaming punctuation achieves an average BLEU-score improvement of 0.66 for the downstream task of Machine Translation (MT).

下载PDF全文

下载文献需遵守相关版权规定

论文标题