多块单个探针差异降低了耦合组成优化的估计器

论文标题

多块单个探针差异降低了耦合组成优化的估计器

Multi-block-Single-probe Variance Reduced Estimator for Coupled Compositional Optimization

论文作者

Jiang, Wei, Li, Gang, Wang, Yibo, Zhang, Lijun, Yang, Tianbao

论文摘要

已经对蜘蛛/莎拉/风暴等方差降低技术进行了广泛的研究，以提高随机非凸优化的收敛速率，这些优化通常维护和更新跨迭代中单个函数的估计器序列。如果我们需要在迭代中跟踪多个功能映射，但只有在每次迭代中访问$ \ Mathcal {O}（1）$功能映射的随机样本时，该怎么办？在解决新兴的耦合组成优化问题的家族中有一个重要的应用，其形式是$ \ sum_ {i = 1}^m f_i（g_i（\ mathbf {w}））$，其中$ g_i $可通过随机甲骨文访问$ g_i $。关键问题是跟踪和估计$ \ mathbf g（\ mathbf {w}）=（g_1（\ g_1（\ mathbf {w}），\ ldots，\ ldots，g_m（\ mathbf {w} w}）$跨迭代，其中$ \ mathbf g（prob prob in IS允许iS pobers of Mathbf g（\ mathbf {w}）$ $ \ MATHCAL {O}（1）$块，以达到其随机价值和Jacobians。为了提高解决这些问题的复杂性，我们提出了一种新颖的随机方法，称为多块单个探针差异（MSVR）估计器，以跟踪$ \ mathbf g（\ mathbf {w}）$的序列。它的灵感来自风暴，但引入了定制的误差校正项，不仅可以减轻所选块的随机样品中的噪声，而且还减轻了未进行采样的块中的噪声。在MSVR估计器的帮助下，我们开发了几种算法来解决上述组成问题，并在具有非convex/convex/convex/stryly convex/polyak-olak-olokidjasiewicz（pl）目标的各种环境中改善复杂性。我们的结果在几个方面都改善了先前的结果，包括样本复杂性的顺序以及对强凸参数的依赖。多任务深度AUC最大化的经验研究表明，使用新估计器的性能更好。

Variance reduction techniques such as SPIDER/SARAH/STORM have been extensively studied to improve the convergence rates of stochastic non-convex optimization, which usually maintain and update a sequence of estimators for a single function across iterations. What if we need to track multiple functional mappings across iterations but only with access to stochastic samples of $\mathcal{O}(1)$ functional mappings at each iteration? There is an important application in solving an emerging family of coupled compositional optimization problems in the form of $\sum_{i=1}^m f_i(g_i(\mathbf{w}))$, where $g_i$ is accessible through a stochastic oracle. The key issue is to track and estimate a sequence of $\mathbf g(\mathbf{w})=(g_1(\mathbf{w}), \ldots, g_m(\mathbf{w}))$ across iterations, where $\mathbf g(\mathbf{w})$ has $m$ blocks and it is only allowed to probe $\mathcal{O}(1)$ blocks to attain their stochastic values and Jacobians. To improve the complexity for solving these problems, we propose a novel stochastic method named Multi-block-Single-probe Variance Reduced (MSVR) estimator to track the sequence of $\mathbf g(\mathbf{w})$. It is inspired by STORM but introduces a customized error correction term to alleviate the noise not only in stochastic samples for the selected blocks but also in those blocks that are not sampled. With the help of the MSVR estimator, we develop several algorithms for solving the aforementioned compositional problems with improved complexities across a spectrum of settings with non-convex/convex/strongly convex/Polyak-Łojasiewicz (PL) objectives. Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on the strong convexity parameter. Empirical studies on multi-task deep AUC maximization demonstrate the better performance of using the new estimator.

下载PDF全文

下载文献需遵守相关版权规定

论文标题