用于多语言查询示例搜索的声学跨度嵌入

论文标题

用于多语言查询示例搜索的声学跨度嵌入

Acoustic span embeddings for multilingual query-by-example search

论文作者

Hu, Yushi, Settle, Shane, Livescu, Karen

论文摘要

查询示例（QBE）语音搜索是将口语查询与搜索集合中的话语相匹配的任务。在低或零资源的设置中，QBE搜索通常通过基于动态时间扭曲（DTW）的方法来解决。最近的工作发现，基于声词嵌入（AWES）的方法可以提高性能和搜索速度。但是，基于敬畏的QBE的先前工作主要集中在英语数据和单词查询上。在这项工作中，我们将敬畏训练概括为单词跨度，产生声学跨度嵌入（ASE），并探索ASE在QBE上的应用，并用多种看不见的语言进行任意长度查询。我们考虑使用与看不见的测试语言不同的其他语言（在我们的情况下，是几种低资源语言）的常用设置。我们在Quesst 2015 QBE任务上评估了我们的方法，发现基于多语言ASE的搜索比基于DTW的搜索要快得多，并且在此任务上胜过最佳先前发布的结果。

Query-by-example (QbE) speech search is the task of matching spoken queries to utterances within a search collection. In low- or zero-resource settings, QbE search is often addressed with approaches based on dynamic time warping (DTW). Recent work has found that methods based on acoustic word embeddings (AWEs) can improve both performance and search speed. However, prior work on AWE-based QbE has primarily focused on English data and with single-word queries. In this work, we generalize AWE training to spans of words, producing acoustic span embeddings (ASE), and explore the application of ASE to QbE with arbitrary-length queries in multiple unseen languages. We consider the commonly used setting where we have access to labeled data in other languages (in our case, several low-resource languages) distinct from the unseen test languages. We evaluate our approach on the QUESST 2015 QbE tasks, finding that multilingual ASE-based search is much faster than DTW-based search and outperforms the best previously published results on this task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题