论文标题
使用神经综合的极端音频时间拉伸
Extreme Audio Time Stretching Using Neural Synthesis
论文作者
论文摘要
提出了针对大型拉伸因子的时间尺度修改(TSM)的深度神经网络解决方案,以针对环境声音。传统的TSM伪像,例如瞬态涂抹,缺乏存在和浮躁,当TSM因子为四个或更大时,会引起强调,并导致音质差。建立的TSM方法的弱点通常基于相位声码器的结构,在于声音的瞬态和噪声组件或细微差别的不良描述和缩放。我们的新颖解决方案结合了罪恶 - 透明分解与独立的绒毛合成器,以更好地描述噪声成分,并改善了大型拉伸因子的声音质量。报告了针对其他四种TSM算法的主观听力测试的结果,表明所提出的方法通常是优越的。所提出的方法是立体声兼容的,并且具有与媒体含量慢动作相关的广泛应用。
A deep neural network solution for time-scale modification (TSM) focused on large stretching factors is proposed, targeting environmental sounds. Traditional TSM artifacts such as transient smearing, loss of presence, and phasiness are heavily accentuated and cause poor audio quality when the TSM factor is four or larger. The weakness of established TSM methods, often based on a phase vocoder structure, lies in the poor description and scaling of the transient and noise components, or nuances, of a sound. Our novel solution combines a sines-transients-noise decomposition with an independent WaveNet synthesizer to provide a better description of the noise component and an improve sound quality for large stretching factors. Results of a subjective listening test against four other TSM algorithms are reported, showing the proposed method to be often superior. The proposed method is stereo compatible and has a wide range of applications related to the slow motion of media content.