关于压缩原理和对神经网络的贝叶斯优化

论文标题

关于压缩原理和对神经网络的贝叶斯优化

On Compression Principle and Bayesian Optimization for Neural Networks

论文作者

Tetelman, Michael

论文摘要

寻找进行可推广预测的方法是机器学习的基本问题。通过研究未知数据的预测问题与无损压缩之间的相似性，我们找到了一种提供解决方案的方法。在本文中，我们提出了一个压缩原则，该原理指出，最佳预测模型是最大程度地减少所有数据和模型定义的总压缩消息长度，同时保证可分解性。按照压缩原则，我们使用贝叶斯方法来构建数据和网络定义的概率模型。使用一系列变异近似序列近似贝叶斯积分的方法作为超参数的优化器实现：贝叶斯随机梯度下降（BSGD）。使用BSGD的培训仅通过仅设置三个参数来完全定义：时期数，数据集的大小和Minibatch的大小，该大小定义了学习率和许多迭代。我们表明，辍学可用于降低尺寸降低，该维度可以根据压缩原理要求找到最佳的网络维度。

Finding methods for making generalizable predictions is a fundamental problem of machine learning. By looking into similarities between the prediction problem for unknown data and the lossless compression we have found an approach that gives a solution. In this paper we propose a compression principle that states that an optimal predictive model is the one that minimizes a total compressed message length of all data and model definition while guarantees decodability. Following the compression principle we use Bayesian approach to build probabilistic models of data and network definitions. A method to approximate Bayesian integrals using a sequence of variational approximations is implemented as an optimizer for hyper-parameters: Bayesian Stochastic Gradient Descent (BSGD). Training with BSGD is completely defined by setting only three parameters: number of epochs, the size of the dataset and the size of the minibatch, which define a learning rate and a number of iterations. We show that dropout can be used for a continuous dimensionality reduction that allows to find optimal network dimensions as required by the compression principle.

下载PDF全文

下载文献需遵守相关版权规定

论文标题