论文标题
Fleurs:很少对语音的普遍表示的学习评估
FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech
论文作者
论文摘要
我们介绍了Fleurs,这是对语音基准的普遍表示的少量学习评估。 Fleurs是一种N-Tay Parallel语音数据集,使用102种语言构建在机器翻译顶部的语言Flores-101基准测试中,每个语言大约有12个小时的语音监督。 Fleurs可用于各种语音任务,包括自动语音识别(ASR),语音语言识别(语音langid),翻译和检索。在本文中,我们为基于MSLAM等多语言预训练模型的任务提供基准。 Fleurs的目的是启用更多语言的语音技术,并在低资源的语音理解中催化研究。
We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on top of the machine translation FLoRes-101 benchmark, with approximately 12 hours of speech supervision per language. FLEURS can be used for a variety of speech tasks, including Automatic Speech Recognition (ASR), Speech Language Identification (Speech LangID), Translation and Retrieval. In this paper, we provide baselines for the tasks based on multilingual pre-trained models like mSLAM. The goal of FLEURS is to enable speech technology in more languages and catalyze research in low-resource speech understanding.