论文标题
在没有梯度的全球Lipschitz连续性的情况下,梯度下降
Gradient Descent in the Absence of Global Lipschitz Continuity of the Gradients
论文作者
论文摘要
梯度下降(GD)是连续优化方法的集合,这些方法在实践中取得了不可估量的成功。由于数据科学应用程序,逐步尺寸缩小的GD已成为一个显着的变体。尽管这种GD的这种变体在文献中已经对具有全球Lipschitz连续梯度的目标进行了充分研究,或者通过需要有限的迭代术语,但数据科学问题的目标并不能满足此类假设。因此,在这项工作中,我们为GD提供了一种新颖的全局收敛分析,其阶跃尺寸减小,可用于可区分的非凸功能,其梯度仅是局部Lipschitz的连续。通过我们的分析,我们概括了有关梯度下降的知识,步骤尺寸减少,包括有趣的拓扑事实。我们阐明了在先前被忽视的差异制度中可能发生的各种行为。因此,我们为GD提供了最一般的全球融合分析,而在现实情况下,对于数据科学问题的实际条件下,步进尺寸减少。
Gradient descent (GD) is a collection of continuous optimization methods that have achieved immeasurable success in practice. Owing to data science applications, GD with diminishing step sizes has become a prominent variant. While this variant of GD has been well-studied in the literature for objectives with globally Lipschitz continuous gradients or by requiring bounded iterates, objectives from data science problems do not satisfy such assumptions. Thus, in this work, we provide a novel global convergence analysis of GD with diminishing step sizes for differentiable nonconvex functions whose gradients are only locally Lipschitz continuous. Through our analysis, we generalize what is known about gradient descent with diminishing step sizes including interesting topological facts; and we elucidate the varied behaviors that can occur in the previously overlooked divergence regime. Thus, we provide the most general global convergence analysis of GD with diminishing step sizes under realistic conditions for data science problems.