多Quartznet：具有多层特征融合的语音识别的多分辨率卷积

论文标题

多Quartznet：具有多层特征融合的语音识别的多分辨率卷积

Multi-QuartzNet: Multi-Resolution Convolution for Speech Recognition with Multi-Layer Feature Fusion

论文作者

Luo, Jian, Wang, Jianzong, Cheng, Ning, Jiang, Guilin, Xiao, Jing

论文摘要

在本文中，我们根据NVIDIA以前的石英NET模型提出了一个端到端的语音识别网络。我们尝试促进模型性能，并设计三个组件：（1）多分辨率卷积模块，用多流卷积代替了原始的1D时间通道可分离卷积。每个流在卷积操作上都有独特的扩张。（2）通过频道注意模块，通过空间通道池计算每个卷积流的注意力重量。（3）多层特征融合模块，通过全局多层特征图将每个卷积块重新定位。我们的实验表明，多Quartznet模型在Aishell-1数据集上达到6.77％，该数据集优于原始石英NET，并且接近最新的结果。

In this paper, we propose an end-to-end speech recognition network based on Nvidia's previous QuartzNet model. We try to promote the model performance, and design three components: (1) Multi-Resolution Convolution Module, replaces the original 1D time-channel separable convolution with multi-stream convolutions. Each stream has a unique dilated stride on convolutional operations. (2) Channel-Wise Attention Module, calculates the attention weight of each convolutional stream by spatial channel-wise pooling. (3) Multi-Layer Feature Fusion Module, reweights each convolutional block by global multi-layer feature maps. Our experiments demonstrate that Multi-QuartzNet model achieves CER 6.77% on AISHELL-1 data set, which outperforms original QuartzNet and is close to state-of-art result.

下载PDF全文

下载文献需遵守相关版权规定

论文标题