论文标题

手机上的实时频谱反演

Real time spectrogram inversion on mobile phone

论文作者

Rybakov, Oleg, Tagliasacchi, Marco, Li, Yunpeng, Jiang, Liyang, Zhang, Xia, Biadsy, Fadi

论文摘要

我们提供了两种实时尺寸频谱图反演的方法:流式Griffin lim(GL)和流媒体。我们展示了向前看对梅尔根感知质量的影响。与其因果版本相比,LookAhead的跳尺寸(12.5ms)只能显着提高感知质量。我们将流媒体GL与流媒体进行比较,并在感知质量,设备延迟,算法延迟,内存足迹和噪声灵敏度方面表现出不同的权衡。为了对GL方法进行公平的质量评估,我们使用输入对数幅度谱图而无需MEL转换。我们评估了有关清洁,嘈杂和非典型语音的实时频谱反转方法。我们指定的条件与梅尔根(Melgan)具有可比性的质量时:嘈杂的音频和无梅尔变换。流式GL比Pixel4的ARM CPU上的实时时间快2.4倍,并且其内存的使用率比Melgan少4.5倍。

We present two methods of real time magnitude spectrogram inversion: streaming Griffin Lim(GL) and streaming MelGAN. We demonstrate the impact of looking ahead on perceptual quality of MelGAN. As little as one hop size (12.5ms) of lookahead is able to significantly improve perceptual quality in comparison to its causal version. We compare streaming GL with the streaming MelGAN and show different trade-offs in terms of perceptual quality, on-device latency, algorithmic delay, memory footprint and noise sensitivity. For fair quality assessment of the GL approach, we use input log magnitude spectrogram without mel transformation. We evaluate presented real time spectrogram inversion approaches on clean, noisy and atypical speech. We specified conditions when streaming GL has comparable quality with MelGAN: noisy audio and no mel transformation. Streaming GL is 2.4x faster than real time on the ARM CPU of a Pixel4 and it uses 4.5x times less memory than MelGAN.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源