音色潜在空间：探索和创造性方面

论文标题

音色潜在空间：探索和创造性方面

Timbre latent space: exploration and creative aspects

论文作者

Caillon, Antoine, Bitton, Adrien, Gatinet, Brice, Esling, Philippe

论文摘要

最近的研究表明，无监督模型使用自动编码器学习可逆音频表示的能力。它们可实现高质量的声音综合，但由于潜在空间没有散布音色属性，因此控制有限。在变异自动编码器（VAE）中研究了分解表示的出现，并已应用于音频。使用额外的感知正则化可以使这种潜在表示与先前建立的多维音色空间保持一致，同时允许连续推断和合成。另外，可以将某些特定的声音属性作为控制变量学习，而无监督的维度则是其余功能的说明。尽管探索和创造性使用它们的表达方式很少。以下实验与两个作曲家合作，并提出了新的创意方向，以探索音乐音色的潜在声音综合，使用专门设计的接口（最大/MSP，纯数据）或映射以基于描述符的合成。

Recent studies show the ability of unsupervised models to learn invertible audio representations using Auto-Encoders. They enable high-quality sound synthesis but a limited control since the latent spaces do not disentangle timbre properties. The emergence of disentangled representations was studied in Variational Auto-Encoders (VAEs), and has been applied to audio. Using an additional perceptual regularization can align such latent representation with the previously established multi-dimensional timbre spaces, while allowing continuous inference and synthesis. Alternatively, some specific sound attributes can be learned as control variables while unsupervised dimensions account for the remaining features. New possibilities for timbre manipulations are enabled with generative neural networks, although the exploration and the creative use of their representations remain little. The following experiments are led in cooperation with two composers and propose new creative directions to explore latent sound synthesis of musical timbres, using specifically designed interfaces (Max/MSP, Pure Data) or mappings for descriptor-based synthesis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题