高度适应性的声学模型，用于精确的多核语音识别

论文标题

高度适应性的声学模型，用于精确的多核语音识别

A Highly Adaptive Acoustic Model for Accurate Multi-Dialect Speech Recognition

论文作者

Yoo, Sanghyun, Song, Inchul, Bengio, Yoshua

论文摘要

尽管在语音识别方面取得了深入学习的成功，但多核心语音识别仍然是一个困难的问题。尽管已知方言特异性的声学模型通常表现良好，但是当方言特异性数据稀缺并且每种语言的方言数量很大时，它们并不容易维护。因此，对许多方言概述的单个统一声学模型（AM）一直在需求。在本文中，我们提出了一种新型的声学建模技术，用于单个AM准确的多核语音识别。我们提出的AM是基于方言信息及其内部表示的动态调整，这导致高度适应性AM同时处理多个方言。我们还建议一种简单但有效的培训方法来处理看不见的方言。大规模语音数据集的实验结果表明，与单一的全调AM相比，该AM的表现均优于所有先前的AM相对的单词错误率（WERS）相对8.11％，而相对相比，与方言特异性AM相比，相对相对7.31％。

Despite the success of deep learning in speech recognition, multi-dialect speech recognition remains a difficult problem. Although dialect-specific acoustic models are known to perform well in general, they are not easy to maintain when dialect-specific data is scarce and the number of dialects for each language is large. Therefore, a single unified acoustic model (AM) that generalizes well for many dialects has been in demand. In this paper, we propose a novel acoustic modeling technique for accurate multi-dialect speech recognition with a single AM. Our proposed AM is dynamically adapted based on both dialect information and its internal representation, which results in a highly adaptive AM for handling multiple dialects simultaneously. We also propose a simple but effective training method to deal with unseen dialects. The experimental results on large scale speech datasets show that the proposed AM outperforms all the previous ones, reducing word error rates (WERs) by 8.11% relative compared to a single all-dialects AM and by 7.31% relative compared to dialect-specific AMs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题