论文标题

噪声稳健关键字发现的歧视性和正交功能学习

Discriminatory and orthogonal feature learning for noise robust keyword spotting

论文作者

Kim, Donghyeon, Ko, Kyungdeuk, Han, David K., Ko, Hanseok

论文摘要

关键字斑点(KWS)是智能设备中的重要组件,用于在用户提示使用命令提示系统时提醒系统。由于这些设备通常受到计算和能源的限制,因此KWS模型应以较小的足迹设计。在以前的工作中,我们开发了轻巧的动态过滤器,这些过滤器在嘈杂的环境中提取了可靠的功能图。通过使用跨凝集损失(CE)损失,通过KWS权重共同优化动态滤波器的学习变量。但是,仅当SNR较低时,仅CE损失就不足以进行高性能。为了在嘈杂的环境中训练网络以更强的性能,我们引入了低变体的正交(LOVO)损失。 LOVO损失由动态滤波器的输出,基于光谱规范的正交损耗以及KWS模型中应用的内部距离损失应用。这些损失对于鼓励网络在看不见的噪声环境中提取歧视性特征特别有用。

Keyword Spotting (KWS) is an essential component in a smart device for alerting the system when a user prompts it with a command. As these devices are typically constrained by computational and energy resources, the KWS model should be designed with a small footprint. In our previous work, we developed lightweight dynamic filters which extract a robust feature map within a noisy environment. The learning variables of the dynamic filter are jointly optimized with KWS weights by using Cross-Entropy (CE) loss. CE loss alone, however, is not sufficient for high performance when the SNR is low. In order to train the network for more robust performance in noisy environments, we introduce the LOw Variant Orthogonal (LOVO) loss. The LOVO loss is composed of a triplet loss applied on the output of the dynamic filter, a spectral norm-based orthogonal loss, and an inner class distance loss applied in the KWS model. These losses are particularly useful in encouraging the network to extract discriminatory features in unseen noise environments.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源