论文标题
在深度学习中提高计算效率的随机清晰度感知培训
Randomized Sharpness-Aware Training for Boosting Computational Efficiency in Deep Learning
论文作者
论文摘要
通过驱动模型将模型收敛到平坦的最小值,清晰度感知的学习算法(例如SAM)显示出实现最新性能的力量。但是,这些算法通常会在每次训练迭代中引起一个额外的前向传播,这在很大程度上负担了计算的负担,尤其是对于可扩展模型。为此,我们提出了一种简单而高效的训练计划,称为随机清晰度训练(RST)。 RST中的优化器将在每次迭代中执行Bernoulli试验,以从基础算法(SGD)和Sharpness-Awance-Awawaweawawawaweawawawawawawaweawawawawawawawawawawawawawawawawawawawawaw的算法(SAM)中进行选择。由于碱算法的混合物,可以大大降低传播对的总体计数。另外,我们对RST收敛性进行理论分析。然后,我们从经验上研究各种调度功能的计算成本和效果,并提供有关设定适当调度功能的说明。此外,我们将RST扩展到了一般框架(G-RST),在该框架上,我们可以在任何调度函数上自由地调整正规化程度。我们表明,在大多数情况下,G-RST可以胜过SAM,同时节省50 \%的额外计算成本。
By driving models to converge to flat minima, sharpness-aware learning algorithms (such as SAM) have shown the power to achieve state-of-the-art performances. However, these algorithms will generally incur one extra forward-backward propagation at each training iteration, which largely burdens the computation especially for scalable models. To this end, we propose a simple yet efficient training scheme, called Randomized Sharpness-Aware Training (RST). Optimizers in RST would perform a Bernoulli trial at each iteration to choose randomly from base algorithms (SGD) and sharpness-aware algorithms (SAM) with a probability arranged by a predefined scheduling function. Due to the mixture of base algorithms, the overall count of propagation pairs could be largely reduced. Also, we give theoretical analysis on the convergence of RST. Then, we empirically study the computation cost and effect of various types of scheduling functions, and give directions on setting appropriate scheduling functions. Further, we extend the RST to a general framework (G-RST), where we can adjust regularization degree on sharpness freely for any scheduling function. We show that G-RST can outperform SAM in most cases while saving 50\% extra computation cost.