Fleurs：很少对语音的普遍表示的学习评估

论文标题

Fleurs：很少对语音的普遍表示的学习评估

FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech

论文作者

Conneau, Alexis, Ma, Min, Khanuja, Simran, Zhang, Yu, Axelrod, Vera, Dalmia, Siddharth, Riesa, Jason, Rivera, Clara, Bapna, Ankur

论文摘要

我们介绍了Fleurs，这是对语音基准的普遍表示的少量学习评估。 Fleurs是一种N-Tay Parallel语音数据集，使用102种语言构建在机器翻译顶部的语言Flores-101基准测试中，每个语言大约有12个小时的语音监督。 Fleurs可用于各种语音任务，包括自动语音识别（ASR），语音语言识别（语音langid），翻译和检索。在本文中，我们为基于MSLAM等多语言预训练模型的任务提供基准。 Fleurs的目的是启用更多语言的语音技术，并在低资源的语音理解中催化研究。

We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on top of the machine translation FLoRes-101 benchmark, with approximately 12 hours of speech supervision per language. FLEURS can be used for a variety of speech tasks, including Automatic Speech Recognition (ASR), Speech Language Identification (Speech LangID), Translation and Retrieval. In this paper, we provide baselines for the tasks based on multilingual pre-trained models like mSLAM. The goal of FLEURS is to enable speech technology in more languages and catalyze research in low-resource speech understanding.

下载PDF全文

下载文献需遵守相关版权规定

论文标题