来源感知的神经语音编码嘈杂的语音压缩

论文标题

来源感知的神经语音编码嘈杂的语音压缩

Source-Aware Neural Speech Coding for Noisy Speech Compression

论文作者

Yang, Haici, Zhen, Kai, Beack, Seungkwon, Kim, Minje

论文摘要

本文介绍了一种基于神经网络的新型语音编码系统，可以有效地处理嘈杂的语音。提出的源源感知的神经音频编码（SANAC）系统协调了基于自动编码器的深度源分离模型和神经编码系统，以便它可以在潜在空间中明确执行源分离和编码。该系统的另一个好处是，编解码器可以为基础源分配不同数量的位，以便更重要的源在解码的信号中听起来更好。我们针对一种新的用例，在该用例中，接收器侧的用户关心语音通信中非语音组件的质量，而语音源仍然具有最关键的信息。客观和主观评估测试都表明，SANAC可以比没有源吸引编码机制的基线神经音频编码系统更好地恢复原始的噪声语音。

This paper introduces a novel neural network-based speech coding system that can process noisy speech effectively. The proposed source-aware neural audio coding (SANAC) system harmonizes a deep autoencoder-based source separation model and a neural coding system so that it can explicitly perform source separation and coding in the latent space. An added benefit of this system is that the codec can allocate a different amount of bits to the underlying sources so that the more important source sounds better in the decoded signal. We target a new use case where the user on the receiver side cares about the quality of the non-speech components in speech communication, while the speech source still carries the most crucial information. Both objective and subjective evaluation tests show that SANAC can recover the original noisy speech better than the baseline neural audio coding system, which is with no source-aware coding mechanism, and two conventional codecs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题