论文标题
Wavebender gan:语音有意义的语音操纵的架构
Wavebender GAN: An architecture for phonetically meaningful speech manipulation
论文作者
论文摘要
深度学习彻底改变了综合语音质量。但是,到目前为止,它对言语科学界几乎没有价值。新方法不符合该领域的从业人员需要的可控性要求,例如:在操纵语音刺激的聆听测试中。取而代之的是,通过使用传统信号处理方法来控制这种刺激中不同语音特性。这限制了操纵的范围,准确性和语音质量。同样,可听见的人工制品对语音感知研究结果的方法学有效性有负面影响。 这项工作介绍了一个能够通过学习而不是设计来操纵语音属性的系统。该体系结构学会控制任意语音属性,并利用神经声码器中的进步以获得现实的输出。通过复制合成和操纵一系列核心语音特征(音调,实圈和语音质量度量)的实验说明了该方法产生具有准确控制和高感知质量的语音刺激的希望。
Deep learning has revolutionised synthetic speech quality. However, it has thus far delivered little value to the speech science community. The new methods do not meet the controllability demands that practitioners in this area require e.g.: in listening tests with manipulated speech stimuli. Instead, control of different speech properties in such stimuli is achieved by using legacy signal-processing methods. This limits the range, accuracy, and speech quality of the manipulations. Also, audible artefacts have a negative impact on the methodological validity of results in speech perception studies. This work introduces a system capable of manipulating speech properties through learning rather than design. The architecture learns to control arbitrary speech properties and leverages progress in neural vocoders to obtain realistic output. Experiments with copy synthesis and manipulation of a small set of core speech features (pitch, formants, and voice quality measures) illustrate the promise of the approach for producing speech stimuli that have accurate control and high perceptual quality.