马尔可夫链随机梯度方法的稳定性和概括

论文标题

马尔可夫链随机梯度方法的稳定性和概括

Stability and Generalization for Markov Chain Stochastic Gradient Methods

论文作者

Wang, Puyu, Lei, Yunwen, Ying, Yiming, Zhou, Ding-Xuan

论文摘要

最近，有大量的工作致力于研究马尔可夫链随机梯度方法（MC-SGMS），该方法主要集中于他们解决最小化问题的收敛分析。在本文中，我们通过统计学习理论框架中的算法稳定性镜头对MC-SGM进行了全面的MC-SGMS分析。对于经验风险最小化（ERM）问题，我们通过引入实用的论点稳定性来建立平稳和非平滑案例的最佳人口风险界限。对于最小值问题，我们建立了在平均参数稳定性和概括误差之间的定量连接，该连接扩展了均匀稳定性的现有结果\ cite {lei2021Stocity}。我们进一步开发了预期和高概率的凸孔问题问题的第一个几乎最佳的收敛速率，这与我们的稳定性结果相结合，表明，对于平滑和非平滑案例，都可以达到最佳的概括界限。据我们所知，这是对梯度从马尔可夫过程采样时对SGM的首次概括分析。

Recently there is a large amount of work devoted to the study of Markov chain stochastic gradient methods (MC-SGMs) which mainly focus on their convergence analysis for solving minimization problems. In this paper, we provide a comprehensive generalization analysis of MC-SGMs for both minimization and minimax problems through the lens of algorithmic stability in the framework of statistical learning theory. For empirical risk minimization (ERM) problems, we establish the optimal excess population risk bounds for both smooth and non-smooth cases by introducing on-average argument stability. For minimax problems, we develop a quantitative connection between on-average argument stability and generalization error which extends the existing results for uniform stability \cite{lei2021stability}. We further develop the first nearly optimal convergence rates for convex-concave problems both in expectation and with high probability, which, combined with our stability results, show that the optimal generalization bounds can be attained for both smooth and non-smooth cases. To the best of our knowledge, this is the first generalization analysis of SGMs when the gradients are sampled from a Markov process.

下载PDF全文

下载文献需遵守相关版权规定

论文标题