论文标题

使用增强代码切换的TTS改善低资源代码转换ASR

Improving Low Resource Code-switched ASR using Augmented Code-switched TTS

论文作者

Sharma, Yash, Abraham, Basil, Taneja, Karan, Jyothi, Preethi

论文摘要

由于在全球多语言社区中广泛使用语音技术,用于代码开关语音的自动语音识别(ASR)系统最近引起了人们的关注。端到端的ASR系统是自然建模的选择,因为它们在单语设置中的易用性和出色的性能。但是,众所周知,端到端系统需要大量标记的语音。在这项工作中,我们使用代码切换的文本到语音(TTS)合成研究了低资源设置中的代码转换ASR的改进。我们提出了两种有针对性的技术,以有效利用TTS语音样本:1)混音,一种现有技术,可通过线性插值来创建新的训练样本,应用于TTS和真实的语音样本,以及2)一种新的损失功能,与TTS样品结合,以鼓励代码切换的预测。我们报告了ASR性能的显着改善,可实现最高5%的绝对单词错误率(WER)的降低,并且使用我们在印度英语代码转换的ASR任务上提出的技术进行代码切换的可测量改进。

Building Automatic Speech Recognition (ASR) systems for code-switched speech has recently gained renewed attention due to the widespread use of speech technologies in multilingual communities worldwide. End-to-end ASR systems are a natural modeling choice due to their ease of use and superior performance in monolingual settings. However, it is well known that end-to-end systems require large amounts of labeled speech. In this work, we investigate improving code-switched ASR in low resource settings via data augmentation using code-switched text-to-speech (TTS) synthesis. We propose two targeted techniques to effectively leverage TTS speech samples: 1) Mixup, an existing technique to create new training samples via linear interpolation of existing samples, applied to TTS and real speech samples, and 2) a new loss function, used in conjunction with TTS samples, to encourage code-switched predictions. We report significant improvements in ASR performance achieving absolute word error rate (WER) reductions of up to 5%, and measurable improvement in code switching using our proposed techniques on a Hindi-English code-switched ASR task.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源