论文标题
改进具有语义词汇特征的语音的设备定向性分类
Improving Device Directedness Classification of Utterances with Semantic Lexical Features
论文作者
论文摘要
用户与Alexa,Google Home和Siri等个人助理的互动通常是由唤醒术语或唤醒字启动的。几个个人助理具有“后续”模式,可让用户在不需要唤醒词的情况下进行其他互动。为了使系统仅在适当时做出响应,而忽略了不打算的语音,则必须将话语归类为设备定向或非设备指导的话语。最先进的系统在很大程度上使用了本任务的声学功能,而其他系统仅使用了词汇功能或添加了基于LM的词汇功能。我们提出了一个定向性分类器,将语义词法特征与轻量级的声学特征相结合,并显示其有效分类定向性。混合域词汇和声学特征模型能够在最先进的仅声学基线模型上实现EER的14%相对减少。最后,我们成功地将转移学习和半监督学习应用于模型,以进一步提高准确性。
User interactions with personal assistants like Alexa, Google Home and Siri are typically initiated by a wake term or wakeword. Several personal assistants feature "follow-up" modes that allow users to make additional interactions without the need of a wakeword. For the system to only respond when appropriate, and to ignore speech not intended for it, utterances must be classified as device-directed or non-device-directed. State-of-the-art systems have largely used acoustic features for this task, while others have used only lexical features or have added LM-based lexical features. We propose a directedness classifier that combines semantic lexical features with a lightweight acoustic feature and show it is effective in classifying directedness. The mixed-domain lexical and acoustic feature model is able to achieve 14% relative reduction of EER over a state-of-the-art acoustic-only baseline model. Finally, we successfully apply transfer learning and semi-supervised learning to the model to improve accuracy even further.