Audioviewer：学习可视化声音

论文标题

Audioviewer：学习可视化声音

AudioViewer: Learning to Visualize Sounds

论文作者

Song, Chunjin, Zhang, Yuchi, Peng, Willis, Mohaghegh, Parmis, Wandt, Bastian, Rhodin, Helge

论文摘要

在感觉替代领域的一个长期目标是通过可视化音频内容来使聋哑人和听力（DHH）人的声音感知。与将语言和文本和文本和图像之间转换为手语的现有模型不同，我们将立即和低级音频定位到适用于通用环境的视频翻译，以及人类的语音。由于这样的替代是人为的，没有用于监督学习的标签，我们的核心贡献是构建从音频到视频的映射，该视频通过高级约束从不成对的示例中学习。对于语音，我们还可以将内容与样式（例如性别和方言）删除。包括人类研究在内的定性和定量结果表明，我们的未配合翻译方法在生成的视频中保持了重要的音频功能，并且面部和数字的视频非常适合可视化人类可以解析的高维音频功能，以匹配和区分声音和单词。代码和型号可在https://chunjinsong.github.io/audioviewer上找到

A long-standing goal in the field of sensory substitution is to enable sound perception for deaf and hard of hearing (DHH) people by visualizing audio content. Different from existing models that translate to hand sign language, between speech and text, or text and images, we target immediate and low-level audio to video translation that applies to generic environment sounds as well as human speech. Since such a substitution is artificial, without labels for supervised learning, our core contribution is to build a mapping from audio to video that learns from unpaired examples via high-level constraints. For speech, we additionally disentangle content from style, such as gender and dialect. Qualitative and quantitative results, including a human study, demonstrate that our unpaired translation approach maintains important audio features in the generated video and that videos of faces and numbers are well suited for visualizing high-dimensional audio features that can be parsed by humans to match and distinguish between sounds and words. Code and models are available at https://chunjinsong.github.io/audioviewer

下载PDF全文

下载文献需遵守相关版权规定

论文标题