论文标题

深度学习中的鲁棒性:好(宽度),坏(深度)和丑陋(初始化)

Robustness in deep learning: The good (width), the bad (depth), and the ugly (initialization)

论文作者

Zhu, Zhenyu, Liu, Fanghui, Chrysos, Grigorios G, Cevher, Volkan

论文摘要

我们研究了(选定)宽,狭窄,深而浅的(选定)深度神经网络中的平均鲁棒性概念,以及懒惰和非懒惰的训练环境。我们证明,在参数不足的环境中,宽度具有负面影响,而在过度参数化的环境中提高了鲁棒性。深度的影响紧密取决于初始化和训练模式。特别是,当用LeCun初始化初始化时,深度有助于懒惰训练制度的鲁棒性。相反,当用神经切线核(NTK)初始化和He Initialization初始化时,深度会损害稳健性。此外,在非懒惰训练制度下,我们演示了两层relu网络的宽度如何使鲁棒性受益。我们的理论发展改善了[Huang等人的结果。 Neurips21; Wu等。 Neurips21],并且与[Bubeck and Sellke Neurips21; Bubeck等。 Colt21]。

We study the average robustness notion in deep neural networks in (selected) wide and narrow, deep and shallow, as well as lazy and non-lazy training settings. We prove that in the under-parameterized setting, width has a negative effect while it improves robustness in the over-parameterized setting. The effect of depth closely depends on the initialization and the training mode. In particular, when initialized with LeCun initialization, depth helps robustness with the lazy training regime. In contrast, when initialized with Neural Tangent Kernel (NTK) and He-initialization, depth hurts the robustness. Moreover, under the non-lazy training regime, we demonstrate how the width of a two-layer ReLU network benefits robustness. Our theoretical developments improve the results by [Huang et al. NeurIPS21; Wu et al. NeurIPS21] and are consistent with [Bubeck and Sellke NeurIPS21; Bubeck et al. COLT21].

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源