自动语音识别的单调分段关注

论文标题

自动语音识别的单调分段关注

Monotonic segmental attention for automatic speech recognition

论文作者

Zeyer, Albert, Schmitt, Robin, Zhou, Wei, Schlüter, Ralf, Ney, Hermann

论文摘要

我们引入了一种新型的分段意见模型，以自动语音识别。我们将解码器的注意力限制在段中，以避免全球关注的二次运行时，更好地推广到长序列，并最终启用流媒体。我们直接比较全球意见和不同的分段注意建模变体。我们开发和比较了两个单独的时同步解码器，一个分解器特别考虑了分段性质，从而进一步改进。使用时间同步解码来进行分段模型是新颖的，并且是迈向流媒体应用的一步。我们的实验显示了长度模型预测段边界的重要性。与文献中的其他单调注意力方法相比，使用分段解码的最终最佳分段模型的性能要比全球意见更好。此外，我们观察到，分段模型可以更好地推广到长达几分钟的长序列。

We introduce a novel segmental-attention model for automatic speech recognition. We restrict the decoder attention to segments to avoid quadratic runtime of global attention, better generalize to long sequences, and eventually enable streaming. We directly compare global-attention and different segmental-attention modeling variants. We develop and compare two separate time-synchronous decoders, one specifically taking the segmental nature into account, yielding further improvements. Using time-synchronous decoding for segmental models is novel and a step towards streaming applications. Our experiments show the importance of a length model to predict the segment boundaries. The final best segmental-attention model using segmental decoding performs better than global-attention, in contrast to other monotonic attention approaches in the literature. Further, we observe that the segmental model generalizes much better to long sequences of up to several minutes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题