使用增强代码切换的TTS改善低资源代码转换ASR

论文标题

使用增强代码切换的TTS改善低资源代码转换ASR

Improving Low Resource Code-switched ASR using Augmented Code-switched TTS

论文作者

Sharma, Yash, Abraham, Basil, Taneja, Karan, Jyothi, Preethi

论文摘要

由于在全球多语言社区中广泛使用语音技术，用于代码开关语音的自动语音识别（ASR）系统最近引起了人们的关注。端到端的ASR系统是自然建模的选择，因为它们在单语设置中的易用性和出色的性能。但是，众所周知，端到端系统需要大量标记的语音。在这项工作中，我们使用代码切换的文本到语音（TTS）合成研究了低资源设置中的代码转换ASR的改进。我们提出了两种有针对性的技术，以有效利用TTS语音样本：1）混音，一种现有技术，可通过线性插值来创建新的训练样本，应用于TTS和真实的语音样本，以及2）一种新的损失功能，与TTS样品结合，以鼓励代码切换的预测。我们报告了ASR性能的显着改善，可实现最高5％的绝对单词错误率（WER）的降低，并且使用我们在印度英语代码转换的ASR任务上提出的技术进行代码切换的可测量改进。

Building Automatic Speech Recognition (ASR) systems for code-switched speech has recently gained renewed attention due to the widespread use of speech technologies in multilingual communities worldwide. End-to-end ASR systems are a natural modeling choice due to their ease of use and superior performance in monolingual settings. However, it is well known that end-to-end systems require large amounts of labeled speech. In this work, we investigate improving code-switched ASR in low resource settings via data augmentation using code-switched text-to-speech (TTS) synthesis. We propose two targeted techniques to effectively leverage TTS speech samples: 1) Mixup, an existing technique to create new training samples via linear interpolation of existing samples, applied to TTS and real speech samples, and 2) a new loss function, used in conjunction with TTS samples, to encourage code-switched predictions. We report significant improvements in ASR performance achieving absolute word error rate (WER) reductions of up to 5%, and measurable improvement in code switching using our proposed techniques on a Hindi-English code-switched ASR task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题