跨模式扬声器验证和认可：多语言观点

论文标题

跨模式扬声器验证和认可：多语言观点

Cross-modal Speaker Verification and Recognition: A Multilingual Perspective

论文作者

Saeed, Muhammad Saad, Nawaz, Shah, Morerio, Pietro, Mahmood, Arif, Gallo, Ignazio, Yousaf, Muhammad Haroon, Del Bue, Alessio

论文摘要

近年来，在跨模式生物识别应用中发现面孔和声音之间的关联以及说话者的识别激增。受到的启发，我们引入了一项艰巨的任务，以在同一组人所说的多种语言之间建立面孔和声音之间的关联。本文的目的是回答两个密切相关的问题：“面部声音协会语言是否独立？”和“无论口语如何，都可以认可演讲者吗？”。这两个问题对于理解有效性和增强多语言生物识别系统的发展非常重要。为了回答它们，我们收集了一个多语言视听数据集，其中包含$ 154 $身份的人类语音剪辑，并从在线上载的各种视频中提取了$ 3 $语言注释。已经对拟议数据集的三个分裂进行了广泛的实验，以调查和回答这些新颖的研究问题，清楚地指出了多语言问题的相关性。

Recent years have seen a surge in finding association between faces and voices within a cross-modal biometric application along with speaker recognition. Inspired from this, we introduce a challenging task in establishing association between faces and voices across multiple languages spoken by the same set of persons. The aim of this paper is to answer two closely related questions: "Is face-voice association language independent?" and "Can a speaker be recognised irrespective of the spoken language?". These two questions are very important to understand effectiveness and to boost development of multilingual biometric systems. To answer them, we collected a Multilingual Audio-Visual dataset, containing human speech clips of $154$ identities with $3$ language annotations extracted from various videos uploaded online. Extensive experiments on the three splits of the proposed dataset have been performed to investigate and answer these novel research questions that clearly point out the relevance of the multilingual problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题