论文标题
使用构象体中的表演者有效的端到端语音识别
Efficient End-to-End Speech Recognition Using Performers in Conformers
论文作者
论文摘要
端到端的端到端语音识别对模型效率提出了很高的要求。大多数先前的工作通过减少模型大小来提高效率。我们建议除模型大小外,还要降低模型体系结构的复杂性。更具体地说,我们通过用表演者替换变压器模块来减少构想中的浮点操作。提出的基于注意力的端到端语音识别模型在Librispeech语料库中产生竞争性能,具有100万参数和线性计算复杂性。所提出的模型还表现出以前的轻量级端到端模型,在单词错误率上相对相对20%。
On-device end-to-end speech recognition poses a high requirement on model efficiency. Most prior works improve the efficiency by reducing model sizes. We propose to reduce the complexity of model architectures in addition to model sizes. More specifically, we reduce the floating-point operations in conformer by replacing the transformer module with a performer. The proposed attention-based efficient end-to-end speech recognition model yields competitive performance on the LibriSpeech corpus with 10 millions of parameters and linear computation complexity. The proposed model also outperforms previous lightweight end-to-end models by about 20% relatively in word error rate.