论文标题

环形网络上的深网:删除对称性揭示了景观几何形状中平坦区域的结构

Deep Networks on Toroids: Removing Symmetries Reveals the Structure of Flat Regions in the Landscape Geometry

论文作者

Pittorino, Fabrizio, Ferraro, Antonio, Perugini, Gabriele, Feinauer, Christoph, Baldassi, Carlo, Zecchina, Riccardo

论文摘要

我们通过将其基于实现功能空间而不是参数空间的几何形状来系统地研究深度神经网络景观的方法。将分类器分组到等效类中,我们开发了一个标准化的参数化,其中所有对称性都被删除,从而导致环形拓扑。在这个空间上,我们探讨了误差景观而不是损失。这使我们能够得出有意义的概念,即最小化器和连接它们的地球通道的平坦度。使用不同的优化算法,这些算法采样了不同平坦度的最小化器,我们研究了模式连接性和相对距离。测试各种最先进的体系结构和基准数据集,我们确认了平坦度与概括性能之间的相关性;我们进一步表明,在功能空间中,较小的minima彼此更近,并且连接它们的大地测量学的屏障很小。我们还发现,通过梯度下降的变体发现的最小化可以通过由参数空间中的两条直线组成的零误差路径连接,即带有单个弯曲的多边形链。我们观察到具有二进制重量和激活的神经网络中类似的定性结果,这为在这种情况下的连通性提供了第一个结果之一。我们的结果取决于对称性的去除,并且与对简单浅层模型进行的一些分析研究所描述的丰富现象学吻合。

We systematize the approach to the investigation of deep neural network landscapes by basing it on the geometry of the space of implemented functions rather than the space of parameters. Grouping classifiers into equivalence classes, we develop a standardized parameterization in which all symmetries are removed, resulting in a toroidal topology. On this space, we explore the error landscape rather than the loss. This lets us derive a meaningful notion of the flatness of minimizers and of the geodesic paths connecting them. Using different optimization algorithms that sample minimizers with different flatness we study the mode connectivity and relative distances. Testing a variety of state-of-the-art architectures and benchmark datasets, we confirm the correlation between flatness and generalization performance; we further show that in function space flatter minima are closer to each other and that the barriers along the geodesics connecting them are small. We also find that minimizers found by variants of gradient descent can be connected by zero-error paths composed of two straight lines in parameter space, i.e. polygonal chains with a single bend. We observe similar qualitative results in neural networks with binary weights and activations, providing one of the first results concerning the connectivity in this setting. Our results hinge on symmetry removal, and are in remarkable agreement with the rich phenomenology described by some recent analytical studies performed on simple shallow models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源