论文标题

多Quartznet:具有多层特征融合的语音识别的多分辨率卷积

Multi-QuartzNet: Multi-Resolution Convolution for Speech Recognition with Multi-Layer Feature Fusion

论文作者

Luo, Jian, Wang, Jianzong, Cheng, Ning, Jiang, Guilin, Xiao, Jing

论文摘要

在本文中,我们根据NVIDIA以前的石英NET模型提出了一个端到端的语音识别网络。我们尝试促进模型性能,并设计三个组件:(1)多分辨率卷积模块,用多流卷积代替了原始的1D时间通道可分离卷积。每个流在卷积操作上都有独特的扩张。 (2)通过频道注意模块,通过空间通道池计算每个卷积流的注意力重量。 (3)多层特征融合模块,通过全局多层特征图将每个卷积块重新定位。我们的实验表明,多Quartznet模型在Aishell-1数据集上达到6.77%,该数据集优于原始石英NET,并且接近最新的结果。

In this paper, we propose an end-to-end speech recognition network based on Nvidia's previous QuartzNet model. We try to promote the model performance, and design three components: (1) Multi-Resolution Convolution Module, replaces the original 1D time-channel separable convolution with multi-stream convolutions. Each stream has a unique dilated stride on convolutional operations. (2) Channel-Wise Attention Module, calculates the attention weight of each convolutional stream by spatial channel-wise pooling. (3) Multi-Layer Feature Fusion Module, reweights each convolutional block by global multi-layer feature maps. Our experiments demonstrate that Multi-QuartzNet model achieves CER 6.77% on AISHELL-1 data set, which outperforms original QuartzNet and is close to state-of-art result.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源