场景 - 敏捷的多微粒语音语音覆盖

论文标题

场景 - 敏捷的多微粒语音语音覆盖

Scene-Agnostic Multi-Microphone Speech Dereverberation

论文作者

Yemini, Yochai, Fetaya, Ethan, Maron, Haggai, Gannot, Sharon

论文摘要

神经网络（NNS）已被广泛应用于语音处理任务，尤其是使用麦克风阵列的任务。然而，大多数现有的NN体系结构只能处理固定和特定位置的麦克风阵列。在本文中，我们提出了一个NN体系结构，该体系结构可以应对麦克风阵列，其麦克风的数量和位置是未知的，并证明了其在语音缩放任务中的适用性。为此，我们的方法利用了对设定数据的深度学习的最新进展来设计一种增强混响原木光谱的体系结构。我们使用模拟混响数据集的嘈杂和无嘈杂的版本来测试所提出的体系结构。我们在嘈杂数据上的实验表明，所提出的场景不合时宜的设置优于强大的场景感知框架，有时甚至使用麦克风更少。在无噪声数据集的情况下，我们表明，在大多数情况下，我们的方法的表现优于位置感知网络以及最先进的加权线性预测误差（WPE）算法。

Neural networks (NNs) have been widely applied in speech processing tasks, and, in particular, those employing microphone arrays. Nevertheless, most existing NN architectures can only deal with fixed and position-specific microphone arrays. In this paper, we present an NN architecture that can cope with microphone arrays whose number and positions of the microphones are unknown, and demonstrate its applicability in the speech dereverberation task. To this end, our approach harnesses recent advances in deep learning on set-structured data to design an architecture that enhances the reverberant log-spectrum. We use noisy and noiseless versions of a simulated reverberant dataset to test the proposed architecture. Our experiments on the noisy data show that the proposed scene-agnostic setup outperforms a powerful scene-aware framework, sometimes even with fewer microphones. With the noiseless dataset we show that, in most cases, our method outperforms the position-aware network as well as the state-of-the-art weighted linear prediction error (WPE) algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题