ClearBuds：用于基于学习的语音增强的无线双耳耳塞

论文标题

ClearBuds：用于基于学习的语音增强的无线双耳耳塞

ClearBuds: Wireless Binaural Earbuds for Learning-Based Speech Enhancement

论文作者

Chatterjee, Ishan, Kim, Maruchi, Jayaram, Vivek, Gollakota, Shyamnath, Kemelmacher-Shlizerman, Ira, Patel, Shwetak, Seitz, Steven M.

论文摘要

我们提出ClearBuds，这是第一个利用神经网络来增强从两个无线耳塞流中的语音的硬件和软件系统。无线耳塞的实时语音增强需要实时和手机运行高质量的声音分离和背景取消。通过做出两个关键的技术贡献，清晰的bud桥桥梁桥梁的最先进的盲目音频源分离和入耳式移动系统的最先进：1）一种能够作为同步的，双耳麦克风阵列运行的新的无线耳塞设计，以及2）轻型双通道语音增强了在移动设备上运行的神经网络。我们的神经网络具有一种新型的级联体系结构，该结构将传统神经网络与基于频谱图的频率掩盖神经网络相结合，以减少音频输出中的工件。结果表明，我们的无线耳塞达到的同步误差小于64微秒，并且我们的网络在随附的手机上的运行时间为21.4毫秒。在以前看不见的室内和室外多路径方案中，对八名用户进行野外评估表明，我们的神经网络概括地学习空间和声学提示，以执行噪声抑制和背景语音删除。在用户研究中，有37名参与者花费了超过15.4小时的评分为1041个音频样本，我们的系统实现了改善的平均意见分数和背景噪音。带有演示的项目页面：https：//clearbuds.cs.washington.edu

We present ClearBuds, the first hardware and software system that utilizes a neural network to enhance speech streamed from two wireless earbuds. Real-time speech enhancement for wireless earbuds requires high-quality sound separation and background cancellation, operating in real-time and on a mobile phone. Clear-Buds bridges state-of-the-art deep learning for blind audio source separation and in-ear mobile systems by making two key technical contributions: 1) a new wireless earbud design capable of operating as a synchronized, binaural microphone array, and 2) a lightweight dual-channel speech enhancement neural network that runs on a mobile device. Our neural network has a novel cascaded architecture that combines a time-domain conventional neural network with a spectrogram-based frequency masking neural network to reduce the artifacts in the audio output. Results show that our wireless earbuds achieve a synchronization error less than 64 microseconds and our network has a runtime of 21.4 milliseconds on an accompanying mobile phone. In-the-wild evaluation with eight users in previously unseen indoor and outdoor multipath scenarios demonstrates that our neural network generalizes to learn both spatial and acoustic cues to perform noise suppression and background speech removal. In a user-study with 37 participants who spent over 15.4 hours rating 1041 audio samples collected in-the-wild, our system achieves improved mean opinion score and background noise suppression. Project page with demos: https://clearbuds.cs.washington.edu

下载PDF全文

下载文献需遵守相关版权规定

论文标题