基于变压器的端到端ASR的移动块编码器

论文标题

基于变压器的端到端ASR的移动块编码器

Shifted Chunk Encoder for Transformer Based Streaming End-to-End ASR

论文作者

Wang, Fangyuan, Xu, Bo

论文摘要

当前，主要有三种基于变压器编码器的流端到END到END（E2E）自动语音识别（ASR）方法，即时间限制的方法，块的方法和基于内存的方法。通常，它们都在线性计算复杂性，全局上下文建模和并行训练方面都有局限性。在这项工作中，我们旨在建立一个模型，以获取流媒体变压器ASR的所有这三个优点。尤其是，我们为块的变压器提出了一个转移的块机制，该机构在块之间提供了跨块连接。因此，在所有原始优点继承时，可以显着增强块模型的全球上下文建模能力。我们将该方案与块的变压器和构象异构体集成在一起，并将它们分别识别为schunk-transformer和schunk-conformer。 Aishell-1上的实验表明，Schunk转换器和构造形式的实验分别可以达到CER 6.43％和5.77％。线性复杂性使它们成为可能大批训练并更有效地推断出来。与具有二次复杂性的U2相比，我们的模型可以大大优于其常规块的块，同时具有竞争力，仅具有0.22的绝对CER下降。与现有的块或基于内存的方法（例如HS-DAC和MMA）相比，可以实现更好的CER。代码已发布。

Currently, there are mainly three kinds of Transformer encoder based streaming End to End (E2E) Automatic Speech Recognition (ASR) approaches, namely time-restricted methods, chunk-wise methods, and memory-based methods. Generally, all of them have limitations in aspects of linear computational complexity, global context modeling, and parallel training. In this work, we aim to build a model to take all these three advantages for streaming Transformer ASR. Particularly, we propose a shifted chunk mechanism for the chunk-wise Transformer which provides cross-chunk connections between chunks. Therefore, the global context modeling ability of chunk-wise models can be significantly enhanced while all the original merits inherited. We integrate this scheme with the chunk-wise Transformer and Conformer, and identify them as SChunk-Transformer and SChunk-Conformer, respectively. Experiments on AISHELL-1 show that the SChunk-Transformer and SChunk-Conformer can respectively achieve CER 6.43% and 5.77%. And the linear complexity makes them possible to train with large batches and infer more efficiently. Our models can significantly outperform their conventional chunk-wise counterparts, while being competitive, with only 0.22 absolute CER drop, when compared with U2 which has quadratic complexity. A better CER can be achieved if compared with existing chunk-wise or memory-based methods, such as HS-DACS and MMA. Code is released.

下载PDF全文

下载文献需遵守相关版权规定

论文标题