论文标题

UAVM:统一音频和视觉模型

UAVM: Towards Unifying Audio and Visual Models

论文作者

Gong, Yuan, Liu, Alexander H., Rouditchenko, Andrew, Glass, James

论文摘要

传统的视听模型具有独立的音频和视频分支。在这项工作中,我们通过设计统一的视听模型(UAVM)来统一音频和视觉分支。 UAVM在VGGSOUND上实现了65.8%的新最先进的视听事件分类精度。更有趣的是,我们还发现了与模式无关的同类产品所没有的一些引人入胜的属性。

Conventional audio-visual models have independent audio and video branches. In this work, we unify the audio and visual branches by designing a Unified Audio-Visual Model (UAVM). The UAVM achieves a new state-of-the-art audio-visual event classification accuracy of 65.8% on VGGSound. More interestingly, we also find a few intriguing properties of UAVM that the modality-independent counterparts do not have.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源