通过人工神经网络建模音乐结构

论文标题

通过人工神经网络建模音乐结构

Modeling Musical Structure with Artificial Neural Networks

论文作者

Lattner, Stefan

论文摘要

近年来，人工神经网络（ANN）已成为解决现实世界问题的通用工具。 ANN在与音乐相关的任务中还显示出巨大的成功，包括音乐摘要和分类，相似性估计，计算机辅助或自主构图以及自动音乐分析。由于结构是西方音乐的基本特征，因此它在所有这些任务中都起着作用。在当前的ANN架构中学习一些结构方面特别具有挑战性。对于中和高级的自相似性，色调和节奏关系尤其如此。在本文中，我探讨了ANN在音乐结构建模的不同方面的应用，确定涉及的一些挑战，并提出解决这些挑战的策略。首先，使用限制性玻尔兹曼机器（RBM）的概率估计，研究了一种概率的自下而上的旋律分割方法。然后，提出了一种自上而下的方法，用于在音乐生成中施加高级结构模板，该方法使用卷积RBM与中间解决方案的梯度降低优化结合了Gibbs采样。此外，我激发了音乐转换在结构建模中的相关性，并展示了连接主义模型如何使用封闭式自动编码器（GAE）来学习音乐片段之间的转换。对于序列的学习转换，我提出了对GAE的特殊预测训练，该训练会产生多形音乐作为一系列间隔的表示。此外，显示了这些间隔表示形式对重复音乐节的自上而下的发现的适用性。最后，提出了GAE的经常性变体，并证明了其在音乐预测中的功效和低级重复结构的建模。

In recent years, artificial neural networks (ANNs) have become a universal tool for tackling real-world problems. ANNs have also shown great success in music-related tasks including music summarization and classification, similarity estimation, computer-aided or autonomous composition, and automatic music analysis. As structure is a fundamental characteristic of Western music, it plays a role in all these tasks. Some structural aspects are particularly challenging to learn with current ANN architectures. This is especially true for mid- and high-level self-similarity, tonal and rhythmic relationships. In this thesis, I explore the application of ANNs to different aspects of musical structure modeling, identify some challenges involved and propose strategies to address them. First, using probability estimations of a Restricted Boltzmann Machine (RBM), a probabilistic bottom-up approach to melody segmentation is studied. Then, a top-down method for imposing a high-level structural template in music generation is presented, which combines Gibbs sampling using a convolutional RBM with gradient-descent optimization on the intermediate solutions. Furthermore, I motivate the relevance of musical transformations in structure modeling and show how a connectionist model, the Gated Autoencoder (GAE), can be employed to learn transformations between musical fragments. For learning transformations in sequences, I propose a special predictive training of the GAE, which yields a representation of polyphonic music as a sequence of intervals. Furthermore, the applicability of these interval representations to a top-down discovery of repeated musical sections is shown. Finally, a recurrent variant of the GAE is proposed, and its efficacy in music prediction and modeling of low-level repetition structure is demonstrated.

下载PDF全文

下载文献需遵守相关版权规定

论文标题