论文标题

您可以听到的模型:具有可播放原型的音频标识

A Model You Can Hear: Audio Identification with Playable Prototypes

论文作者

Loiseau, Romain, Bouvier, Baptiste, Teytaut, Yann, Vincent, Elliot, Aubry, Mathieu, Landrieu, Loic

论文摘要

机器学习技术已被证明可用于分类和分析音频内容。但是,最近的方法通常依赖于难以解释的抽象和高维表示。受到为图像和3D数据开发的转换不变方法的启发,我们提出了一个基于可学习的光谱原型的音频识别模型。这些原型配备了专用转换网络,可用于聚集和分类来自大量声音的输入音频样本。我们的模型可以在有或没有监督的情况下进行培训,并为演讲者和仪器识别达到最先进的结果,同时保持易于解释。该代码可在以下网址找到:https://github.com/romainloiseau/a-model-you-can-hind

Machine learning techniques have proved useful for classifying and analyzing audio content. However, recent methods typically rely on abstract and high-dimensional representations that are difficult to interpret. Inspired by transformation-invariant approaches developed for image and 3D data, we propose an audio identification model based on learnable spectral prototypes. Equipped with dedicated transformation networks, these prototypes can be used to cluster and classify input audio samples from large collections of sounds. Our model can be trained with or without supervision and reaches state-of-the-art results for speaker and instrument identification, while remaining easily interpretable. The code is available at: https://github.com/romainloiseau/a-model-you-can-hear

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源