论文标题
基于深度学习的演讲者认可:概述
Speaker Recognition Based on Deep Learning: An Overview
论文作者
论文摘要
说话者的认可是从声音中识别人的任务。最近,深度学习彻底改变了说话者的认可。但是,关于令人兴奋的进步缺乏全面的评论。 在本文中,我们回顾了说话者识别的几个主要子任务,包括说话者验证,识别,诊断和强大的说话者识别,重点是基于深度学习的方法。由于深度学习比传统方法的主要优势在于其表示能力,能够从语音中产生高度抽象的嵌入功能,因此我们首先要密切注意基于深度学习的扬声器功能提取,包括输入,网络结构,时间汇总策略和目标功能,这是许多扬声器识别子任务的基本组件。然后,我们概述说话者诊断,重点是最近受监督,端到端和在线诊断。最后,我们从域的适应和语音增强的角度调查了强大的说话者的认可,这是处理域不匹配和噪声问题的两种主要方法。流行和最近发行的Corpora在本文的末尾列出。
Speaker recognition is a task of identifying persons from their voices. Recently, deep learning has dramatically revolutionized speaker recognition. However, there is lack of comprehensive reviews on the exciting progress. In this paper, we review several major subtasks of speaker recognition, including speaker verification, identification, diarization, and robust speaker recognition, with a focus on deep-learning-based methods. Because the major advantage of deep learning over conventional methods is its representation ability, which is able to produce highly abstract embedding features from utterances, we first pay close attention to deep-learning-based speaker feature extraction, including the inputs, network structures, temporal pooling strategies, and objective functions respectively, which are the fundamental components of many speaker recognition subtasks. Then, we make an overview of speaker diarization, with an emphasis of recent supervised, end-to-end, and online diarization. Finally, we survey robust speaker recognition from the perspectives of domain adaptation and speech enhancement, which are two major approaches of dealing with domain mismatch and noise problems. Popular and recently released corpora are listed at the end of the paper.