论文标题
在日常家庭环境中使用可穿戴感应的前景语音自动检测:转移学习方法
Automated detection of foreground speech with wearable sensing in everyday home environments: A transfer learning approach
论文作者
论文摘要
事实证明,声学传感是在健康和人类行为分析中众多应用的基础上有效的。在这项工作中,我们专注于从智能手表捕获的音频中检测自然主义环境中面对面的社交互动的问题。作为检测社交互动的第一步,至关重要的是要区分个人的言语与附近的所有其他声音,例如其他人的讲话和环境声音。在现实环境中,这是非常具有挑战性的,在现实的环境中,互动是自发进行的,而监督的模型无法接受培训,以认识到动态社交环境的全部复杂性。在本文中,我们介绍了一种基于转移学习的方法,以检测戴着智能手表的用户的前景语音。该方法的一个亮点是,它不取决于语音样本的集合来构建特定于用户的模型。相反,该方法基于从公共数据集中得出的通用说话者表示的知识转移。我们的实验表明,我们的方法的性能与完全监督的模型相当,而F1得分为80%。为了评估该方法,我们在18户家庭中收集了31小时的智能手表录制音频的数据集,共39名参与者执行各种半控件任务。
Acoustic sensing has proved effective as a foundation for numerous applications in health and human behavior analysis. In this work, we focus on the problem of detecting in-person social interactions in naturalistic settings from audio captured by a smartwatch. As a first step towards detecting social interactions, it is critical to distinguish the speech of the individual wearing the watch from all other sounds nearby, such as speech from other individuals and ambient sounds. This is very challenging in realistic settings, where interactions take place spontaneously and supervised models cannot be trained apriori to recognize the full complexity of dynamic social environments. In this paper, we introduce a transfer learning-based approach to detect foreground speech of users wearing a smartwatch. A highlight of the method is that it does not depend on the collection of voice samples to build user-specific models. Instead, the approach is based on knowledge transfer from general-purpose speaker representations derived from public datasets. Our experiments demonstrate that our approach performs comparably to a fully supervised model, with 80% F1 score. To evaluate the method, we collected a dataset of 31 hours of smartwatch-recorded audio in 18 homes with a total of 39 participants performing various semi-controlled tasks.