使用Metropolis Monte Carlo和自适应变体培训神经网络

论文标题

使用Metropolis Monte Carlo和自适应变体培训神经网络

Training neural networks using Metropolis Monte Carlo and an adaptive variant

论文作者

Whitelam, Stephen, Selin, Viktor, Benlolo, Ian, Casert, Corneel, Tamblyn, Isaac

论文摘要

我们将零温度的大都市蒙特卡洛算法作为一种工具来训练神经网络，通过最大程度地减少损失函数来训练神经网络。我们发现，正如理论上的预期，并在其他作者的经验上表现出来，Metropolis Monte Carlo可以训练具有与梯度下降相当的神经网的训练，即使不一定那么快。当神经网络的参数数量较大时，大都市算法不会自动失败。当神经网络的结构或神经元激活强烈异质时，它可能会失败，并且我们引入了一种自适应的蒙特卡洛算法AMC来克服这些局限性。 Monte Carlo方法的内在随机性和数值稳定性使AMC可以训练深层神经网络和经常性的神经网络，其中梯度太小或太大，无法通过梯度下降进行训练。 Monte Carlo方法为训练神经网络的基于梯度的方法提供了补充，从而可以访问一组不同的网络架构和原理。

We examine the zero-temperature Metropolis Monte Carlo algorithm as a tool for training a neural network by minimizing a loss function. We find that, as expected on theoretical grounds and shown empirically by other authors, Metropolis Monte Carlo can train a neural net with an accuracy comparable to that of gradient descent, if not necessarily as quickly. The Metropolis algorithm does not fail automatically when the number of parameters of a neural network is large. It can fail when a neural network's structure or neuron activations are strongly heterogenous, and we introduce an adaptive Monte Carlo algorithm, aMC, to overcome these limitations. The intrinsic stochasticity and numerical stability of the Monte Carlo method allow aMC to train deep neural networks and recurrent neural networks in which the gradient is too small or too large to allow training by gradient descent. Monte Carlo methods offer a complement to gradient-based methods for training neural networks, allowing access to a distinct set of network architectures and principles.

下载PDF全文

下载文献需遵守相关版权规定

论文标题