论文标题

合并广泛的语音信息以增强语音

Incorporating Broad Phonetic Information for Speech Enhancement

论文作者

Lu, Yen-Ju, Liao, Chien-Feng, Lu, Xugang, Hung, Jeih-weih, Tsao, Yu

论文摘要

在嘈杂的条件下,了解语音内容会促进听众更有效地抑制背景噪声组件并检索纯语音信号。先前的研究还证实了将语音信息纳入语音增强系统(SE)系统以实现更好的降解性能的好处。为了获得语音信息,我们通常会准备一个基于音素的声学模型,该模型是使用语音波形和音素标签训练的。尽管在正常的嘈杂条件下表现良好,但是在非常嘈杂的条件下运行时,公认的音素可能是错误的,因此误导了SE过程。为了克服限制,本研究建议将广泛的语音类别(BPC)信息纳入SE过程。我们已经研究了建立BPC的三个标准,其中包括两个基于知识的标准:关节和一个数据驱动标准的位置和方式。此外,BPC的识别精度远高于音素的识别精度,因此提供了更准确的语音信息来指导在非常嘈杂的条件下的SE过程。实验结果表明,使用BPC信息框架提出的SE可以在基线系统和SE系统上使用单声道信息实现明显的性能改进,从而在TIMIT数据集上的两种语音质量清晰度方面实现。

In noisy conditions, knowing speech contents facilitates listeners to more effectively suppress background noise components and to retrieve pure speech signals. Previous studies have also confirmed the benefits of incorporating phonetic information in a speech enhancement (SE) system to achieve better denoising performance. To obtain the phonetic information, we usually prepare a phoneme-based acoustic model, which is trained using speech waveforms and phoneme labels. Despite performing well in normal noisy conditions, when operating in very noisy conditions, however, the recognized phonemes may be erroneous and thus misguide the SE process. To overcome the limitation, this study proposes to incorporate the broad phonetic class (BPC) information into the SE process. We have investigated three criteria to build the BPC, including two knowledge-based criteria: place and manner of articulatory and one data-driven criterion. Moreover, the recognition accuracies of BPCs are much higher than that of phonemes, thus providing more accurate phonetic information to guide the SE process under very noisy conditions. Experimental results demonstrate that the proposed SE with the BPC information framework can achieve notable performance improvements over the baseline system and an SE system using monophonic information in terms of both speech quality intelligibility on the TIMIT dataset.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源