论文标题
神经网络平滑度的定量几何方法
A Quantitative Geometric Approach to Neural-Network Smoothness
论文作者
论文摘要
快速,精确的Lipschitz对神经网络的持续估计是深度学习的重要任务。研究人员最近发现神经网络的准确性和平滑度之间存在内在的权衡,因此训练具有宽松Lipschitz恒定估计的网络会实现强大的正则化,并可能严重损害模型的准确性。在这项工作中,我们提供了一个统一的理论框架,即一种定量的几何方法,以解决Lipschitz的持续估计。通过采用该框架,我们可以立即获得几个理论结果,包括Lipschitz常数估计的计算硬度及其近似性。此外,定量几何观点还可以提供一些有关最近经验观察的见解,即一种规范的技术通常不会转移到另一种规范上。 我们还实施了工具Geolip中这种定量几何方法引起的算法。这些算法基于半决赛编程(SDP)。我们的经验评估表明,Geolip比Lipschitz上的现有工具更可扩展,更精确,以$ \ ell_ \ infty $ perturbations的持续估计。此外,在理论上和经验上,我们还与其他基于SDP的技术相关联的关系具有复杂的关系。我们认为,这种统一的定量几何观点可以为神经网络的平稳性和鲁棒性带来新的见解和理论工具。
Fast and precise Lipschitz constant estimation of neural networks is an important task for deep learning. Researchers have recently found an intrinsic trade-off between the accuracy and smoothness of neural networks, so training a network with a loose Lipschitz constant estimation imposes a strong regularization and can hurt the model accuracy significantly. In this work, we provide a unified theoretical framework, a quantitative geometric approach, to address the Lipschitz constant estimation. By adopting this framework, we can immediately obtain several theoretical results, including the computational hardness of Lipschitz constant estimation and its approximability. Furthermore, the quantitative geometric perspective can also provide some insights into recent empirical observations that techniques for one norm do not usually transfer to another one. We also implement the algorithms induced from this quantitative geometric approach in a tool GeoLIP. These algorithms are based on semidefinite programming (SDP). Our empirical evaluation demonstrates that GeoLIP is more scalable and precise than existing tools on Lipschitz constant estimation for $\ell_\infty$-perturbations. Furthermore, we also show its intricate relations with other recent SDP-based techniques, both theoretically and empirically. We believe that this unified quantitative geometric perspective can bring new insights and theoretical tools to the investigation of neural-network smoothness and robustness.