个性化的声学回声取消全双工通信

论文标题

个性化的声学回声取消全双工通信

Personalized Acoustic Echo Cancellation for Full-duplex Communications

论文作者

Zhang, Shimin, Wang, Ziteng, Ju, Yukai, Fu, Yihui, Na, Yueyue, Fu, Qiang, Xie, Lei

论文摘要

深度神经网络（DNNS）显示了声音消除（AEC）的有希望的结果。但是，基于DNN的AEC模型允许所有近端发言人在内，包括干扰语音。鉴于最新的对个性化语音增强的研究，我们在本文中研究了个性化声音回声取消（PAEC）的可行性，以进行全双工通讯，在此论文中，背景噪音和干扰扬声器可能会与声学回声共存。具体而言，我们首先提出了一个新型的主链神经网络，该网络被称为门控时间卷积神经网络（GTCNN），该网络在性能方面的表现优于最先进的AEC模型。诸如D-Vectors之类的演讲者嵌入式嵌入式被用作辅助信息，以指导GTCNN专注于目标扬声器。 PAEC的一个特殊情况是，呼叫上的双方的语音片段已注册。实验结果表明，来自近端发言人或远端扬声器的辅助信息可以改善基于DNN的AEC性能。然而，在使用有限维扬声器嵌入的利用方面仍然有很大的改进空间。

Deep neural networks (DNNs) have shown promising results for acoustic echo cancellation (AEC). But the DNN-based AEC models let through all near-end speakers including the interfering speech. In light of recent studies on personalized speech enhancement, we investigate the feasibility of personalized acoustic echo cancellation (PAEC) in this paper for full-duplex communications, where background noise and interfering speakers may coexist with acoustic echoes. Specifically, we first propose a novel backbone neural network termed as gated temporal convolutional neural network (GTCNN) that outperforms state-of-the-art AEC models in performance. Speaker embeddings like d-vectors are further adopted as auxiliary information to guide the GTCNN to focus on the target speaker. A special case in PAEC is that speech snippets of both parties on the call are enrolled. Experimental results show that auxiliary information from either the near-end speaker or the far-end speaker can improve the DNN-based AEC performance. Nevertheless, there is still much room for improvement in the utilization of the finite-dimensional speaker embeddings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题