学习强大而多语言的语音表示

论文标题

学习强大而多语言的语音表示

Learning Robust and Multilingual Speech Representations

论文作者

Kawakami, Kazuya, Wang, Luyu, Dyer, Chris, Blunsom, Phil, Oord, Aaron van den

论文摘要

无监督的语音表示学习在寻找与语音结构相关并改善下游语音识别性能的表示方面表现出色。但是，大多数研究都专注于评估其在提高英语语音识别系统（例如Wall Street Journal and Librispeech）的能力方面。这种评估方法忽略了语音表示应具有的两个重要的逃避：对领域的稳健性和向其他语言的转移性。在本文中，我们从8000个小时的多样化和嘈杂的语音数据中学习了表示形式，并通过查看其对域转移的稳健性以及提高许多语言的识别表现的能力来评估表示形式。我们发现，我们的表示形式赋予了由此产生的识别系统的显着鲁棒性优势：相对于基线功能集，分类外传输的显着改善，并且功能同样可以改善25种语音多样性的语言，包括音调语言和低资源语言。

Unsupervised speech representation learning has shown remarkable success at finding representations that correlate with phonetic structures and improve downstream speech recognition performance. However, most research has been focused on evaluating the representations in terms of their ability to improve the performance of speech recognition systems on read English (e.g. Wall Street Journal and LibriSpeech). This evaluation methodology overlooks two important desiderata that speech representations should have: robustness to domain shifts and transferability to other languages. In this paper we learn representations from up to 8000 hours of diverse and noisy speech data and evaluate the representations by looking at their robustness to domain shifts and their ability to improve recognition performance in many languages. We find that our representations confer significant robustness advantages to the resulting recognition systems: we see significant improvements in out-of-domain transfer relative to baseline feature sets and the features likewise provide improvements in 25 phonetically diverse languages including tonal languages and low-resource languages.

下载PDF全文

下载文献需遵守相关版权规定

论文标题