梯度下降平均和原始二平均值，以强烈凸优化

论文标题

梯度下降平均和原始二平均值，以强烈凸优化

Gradient Descent Averaging and Primal-dual Averaging for Strongly Convex Optimization

论文作者

Tao, Wei, Li, Wei, Pan, Zhisong, Tao, Qing

论文摘要

平均方案在深度学习以及传统的机器学习中引起了广泛的关注。它在理论上实现了最佳的收敛，还提高了经验模型的性能。但是，仍然缺乏足够的收敛分析来强烈凸出优化。通常，梯度下降方法的最后一个迭代的收敛性（称为个体收敛）由于存在对数因子而无法达到其最佳性。为了消除此因素，我们首先开发梯度下降平均（GDA），这是在强凸设置中基于一般投影的双重平均算法。我们进一步提出了针对强凸病例（SC-PDA）的原始二平均值，在同时使用原始和双重平均方案。我们证明，GDA从输出平均方面产生最佳收敛速率，而SC-PDA则得出了最佳的个人收敛。关于SVM和深度学习模型的几项实验验证了算法的理论分析和有效性的正确性。

Averaging scheme has attracted extensive attention in deep learning as well as traditional machine learning. It achieves theoretically optimal convergence and also improves the empirical model performance. However, there is still a lack of sufficient convergence analysis for strongly convex optimization. Typically, the convergence about the last iterate of gradient descent methods, which is referred to as individual convergence, fails to attain its optimality due to the existence of logarithmic factor. In order to remove this factor, we first develop gradient descent averaging (GDA), which is a general projection-based dual averaging algorithm in the strongly convex setting. We further present primal-dual averaging for strongly convex cases (SC-PDA), where primal and dual averaging schemes are simultaneously utilized. We prove that GDA yields the optimal convergence rate in terms of output averaging, while SC-PDA derives the optimal individual convergence. Several experiments on SVMs and deep learning models validate the correctness of theoretical analysis and effectiveness of algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题