论文标题
个性化的声学回声取消全双工通信
Personalized Acoustic Echo Cancellation for Full-duplex Communications
论文作者
论文摘要
深度神经网络(DNNS)显示了声音消除(AEC)的有希望的结果。但是,基于DNN的AEC模型允许所有近端发言人在内,包括干扰语音。鉴于最新的对个性化语音增强的研究,我们在本文中研究了个性化声音回声取消(PAEC)的可行性,以进行全双工通讯,在此论文中,背景噪音和干扰扬声器可能会与声学回声共存。具体而言,我们首先提出了一个新型的主链神经网络,该网络被称为门控时间卷积神经网络(GTCNN),该网络在性能方面的表现优于最先进的AEC模型。诸如D-Vectors之类的演讲者嵌入式嵌入式被用作辅助信息,以指导GTCNN专注于目标扬声器。 PAEC的一个特殊情况是,呼叫上的双方的语音片段已注册。实验结果表明,来自近端发言人或远端扬声器的辅助信息可以改善基于DNN的AEC性能。然而,在使用有限维扬声器嵌入的利用方面仍然有很大的改进空间。
Deep neural networks (DNNs) have shown promising results for acoustic echo cancellation (AEC). But the DNN-based AEC models let through all near-end speakers including the interfering speech. In light of recent studies on personalized speech enhancement, we investigate the feasibility of personalized acoustic echo cancellation (PAEC) in this paper for full-duplex communications, where background noise and interfering speakers may coexist with acoustic echoes. Specifically, we first propose a novel backbone neural network termed as gated temporal convolutional neural network (GTCNN) that outperforms state-of-the-art AEC models in performance. Speaker embeddings like d-vectors are further adopted as auxiliary information to guide the GTCNN to focus on the target speaker. A special case in PAEC is that speech snippets of both parties on the call are enrolled. Experimental results show that auxiliary information from either the near-end speaker or the far-end speaker can improve the DNN-based AEC performance. Nevertheless, there is still much room for improvement in the utilization of the finite-dimensional speaker embeddings.