改进具有语义词汇特征的语音的设备定向性分类

论文标题

改进具有语义词汇特征的语音的设备定向性分类

Improving Device Directedness Classification of Utterances with Semantic Lexical Features

论文作者

Gillespie, Kellen, Konstantakopoulos, Ioannis C., Guo, Xingzhi, Vasudevan, Vishal Thanvantri, Sethy, Abhinav

论文摘要

用户与Alexa，Google Home和Siri等个人助理的互动通常是由唤醒术语或唤醒字启动的。几个个人助理具有“后续”模式，可让用户在不需要唤醒词的情况下进行其他互动。为了使系统仅在适当时做出响应，而忽略了不打算的语音，则必须将话语归类为设备定向或非设备指导的话语。最先进的系统在很大程度上使用了本任务的声学功能，而其他系统仅使用了词汇功能或添加了基于LM的词汇功能。我们提出了一个定向性分类器，将语义词法特征与轻量级的声学特征相结合，并显示其有效分类定向性。混合域词汇和声学特征模型能够在最先进的仅声学基线模型上实现EER的14％相对减少。最后，我们成功地将转移学习和半监督学习应用于模型，以进一步提高准确性。

User interactions with personal assistants like Alexa, Google Home and Siri are typically initiated by a wake term or wakeword. Several personal assistants feature "follow-up" modes that allow users to make additional interactions without the need of a wakeword. For the system to only respond when appropriate, and to ignore speech not intended for it, utterances must be classified as device-directed or non-device-directed. State-of-the-art systems have largely used acoustic features for this task, while others have used only lexical features or have added LM-based lexical features. We propose a directedness classifier that combines semantic lexical features with a lightweight acoustic feature and show it is effective in classifying directedness. The mixed-domain lexical and acoustic feature model is able to achieve 14% relative reduction of EER over a state-of-the-art acoustic-only baseline model. Finally, we successfully apply transfer learning and semi-supervised learning to the model to improve accuracy even further.

下载PDF全文

下载文献需遵守相关版权规定

论文标题