论文标题
了解一层隐藏的Relu网络的全球损失格局,第2部分:实验和分析
Understanding Global Loss Landscape of One-hidden-layer ReLU Networks, Part 2: Experiments and Analysis
论文作者
论文摘要
在[8]中,已经研究了一个隐藏层的Relu网络的局部最小值。基于理论,在本文中,我们首先分析了现有局部最小值对于一维高斯数据的概率以及在整个重量空间中的变化多大。我们表明,在大多数地区,这种概率非常低。然后,我们设计并实施了一种基于线性编程的方法来判断真正的本地最小值的存在,并使用它来预测MNIST和CIFAR-10数据集是否存在不良的本地最小值,并发现一旦某些隐藏的神经元被样品激活了一些隐藏的神经元,那么在重量空间中几乎没有任何可靠的本地极小片。这些理论预测通过表明梯度下降未捕获在其启动的细胞中,通过实验验证。我们还执行实验,以探索体重空间中可微分细胞的计数和大小。
The existence of local minima for one-hidden-layer ReLU networks has been investigated theoretically in [8]. Based on the theory, in this paper, we first analyze how big the probability of existing local minima is for 1D Gaussian data and how it varies in the whole weight space. We show that this probability is very low in most regions. We then design and implement a linear programming based approach to judge the existence of genuine local minima, and use it to predict whether bad local minima exist for the MNIST and CIFAR-10 datasets, and find that there are no bad differentiable local minima almost everywhere in weight space once some hidden neurons are activated by samples. These theoretical predictions are verified experimentally by showing that gradient descent is not trapped in the cells from which it starts. We also perform experiments to explore the count and size of differentiable cells in the weight space.