您可以听到的模型：具有可播放原型的音频标识

论文标题

您可以听到的模型：具有可播放原型的音频标识

A Model You Can Hear: Audio Identification with Playable Prototypes

论文作者

Loiseau, Romain, Bouvier, Baptiste, Teytaut, Yann, Vincent, Elliot, Aubry, Mathieu, Landrieu, Loic

论文摘要

机器学习技术已被证明可用于分类和分析音频内容。但是，最近的方法通常依赖于难以解释的抽象和高维表示。受到为图像和3D数据开发的转换不变方法的启发，我们提出了一个基于可学习的光谱原型的音频识别模型。这些原型配备了专用转换网络，可用于聚集和分类来自大量声音的输入音频样本。我们的模型可以在有或没有监督的情况下进行培训，并为演讲者和仪器识别达到最先进的结果，同时保持易于解释。该代码可在以下网址找到：https：//github.com/romainloiseau/a-model-you-can-hind

Machine learning techniques have proved useful for classifying and analyzing audio content. However, recent methods typically rely on abstract and high-dimensional representations that are difficult to interpret. Inspired by transformation-invariant approaches developed for image and 3D data, we propose an audio identification model based on learnable spectral prototypes. Equipped with dedicated transformation networks, these prototypes can be used to cluster and classify input audio samples from large collections of sounds. Our model can be trained with or without supervision and reaches state-of-the-art results for speaker and instrument identification, while remaining easily interpretable. The code is available at: https://github.com/romainloiseau/a-model-you-can-hear

下载PDF全文

下载文献需遵守相关版权规定

论文标题