论文标题

泰国wav2vec2.0带有普通voice v8

Thai Wav2Vec2.0 with CommonVoice V8

论文作者

Phatthiyaphaibun, Wannaphong, Chaksangchaichot, Chompakorn, Limkonchotiwat, Peerat, Chuangsuwanich, Ekapol, Nutanong, Sarana

论文摘要

最近,将音频转换为文本的系统自动语音识别(ASR)在机器学习社区引起了很多关注。因此,HuggingFace发布了许多公开模型。但是,这些ASR模型中的大多数都提供英文。泰国只有少数模型可用。此外,大多数泰国ASR模型都是封闭的,现有开源型号的性能缺乏稳健性。为了解决这个问题,我们使用泰语CommonVoice Corpus V8训练新的ASR模型在预训练的XLSR-WAV2VEC模型上,并训练Trigram语言模型以提高我们的ASR模型的性能。我们希望我们的模型对泰国的个人和ASR社区有益。

Recently, Automatic Speech Recognition (ASR), a system that converts audio into text, has caught a lot of attention in the machine learning community. Thus, a lot of publicly available models were released in HuggingFace. However, most of these ASR models are available in English; only a minority of the models are available in Thai. Additionally, most of the Thai ASR models are closed-sourced, and the performance of existing open-sourced models lacks robustness. To address this problem, we train a new ASR model on a pre-trained XLSR-Wav2Vec model with the Thai CommonVoice corpus V8 and train a trigram language model to boost the performance of our ASR model. We hope that our models will be beneficial to individuals and the ASR community in Thailand.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源