论文标题

Masri-Headset:马耳他语料库以供语音识别

MASRI-HEADSET: A Maltese Corpus for Speech Recognition

论文作者

Mena, Carlos, Gatt, Albert, DeMarco, Andrea, Borg, Claudia, van der Plas, Lonneke, Muscat, Amanda, Padovani, Ian

论文摘要

马耳他的马耳他语是马耳他的民族语言,大约有50万人说。马耳他的语音处理仍处于发展的早期阶段。在本文中,我们介绍了第一个专为自动语音识别(ASR)设计的马耳他语料库(ASR)。马耳他大学的Masri Project开发了Masri-Headset语料库。它由8个小时的语音与文本配对,并通过在实验室环境中使用短文本片段记录。演讲者是从马耳他群岛各地的不同地理位置招募的,并通过性别均匀分布。本文还提供了使用狮身人面像和卡尔迪的马耳他ASR基线实验中获得的一些初始结果。 Masri-Headset语料库可公开用于研究/学术目的。

Maltese, the national language of Malta, is spoken by approximately 500,000 people. Speech processing for Maltese is still in its early stages of development. In this paper, we present the first spoken Maltese corpus designed purposely for Automatic Speech Recognition (ASR). The MASRI-HEADSET corpus was developed by the MASRI project at the University of Malta. It consists of 8 hours of speech paired with text, recorded by using short text snippets in a laboratory environment. The speakers were recruited from different geographical locations all over the Maltese islands, and were roughly evenly distributed by gender. This paper also presents some initial results achieved in baseline experiments for Maltese ASR using Sphinx and Kaldi. The MASRI-HEADSET Corpus is publicly available for research/academic purposes.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源