通过反馈图，几乎最佳的最佳世界学习算法用于在线学习

论文标题

通过反馈图，几乎最佳的最佳世界学习算法用于在线学习

Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs

论文作者

Ito, Shinji, Tsuchiya, Taira, Honda, Junya

论文摘要

这项研究考虑了使用一般的有向反馈图的在线学习。对于这个问题，我们提出了最佳世界算法，这些算法几乎可以为对抗环境以及随机环境的多同性恋遗憾界定。如Alon等。 [2015]表明，紧密的遗憾界限取决于反馈图的结构：可观察到的图形产生$ \tildeθ（α^{1/2} t^{1/2}）$的最小值遗憾，而弱观察到的图形可诱导$ \tildeθ（$ \tildeθ）$ \tildeθ（Δ $δ$分别代表图形的独立数和图形特定部分的支配数。我们提出的强烈可观察图的算法的遗憾为$ \ tilde {o}（α^{1/2} t^{1/2}）$用于对抗环境，以及$ {o}（o}（o}）（\ frac {\ frac {α（\ ln t）^3} $} $} $} $} $δ_ {\ min} $表示最小次优差距。该结果解决了Erez和Koren [2021]提出的一个公开问题。我们还为弱可观察的图提供了一种算法，该算法实现了$ \ tilde {o}（δ^{1/3} t^{2/3}）$的遗憾，以实现对抗环境和对随机环境的多洛格里斯遗憾。所提出的算法基于以下规范化的领导方法，并结合新设计的学习率更新规则。

This study considers online learning with general directed feedback graphs. For this problem, we present best-of-both-worlds algorithms that achieve nearly tight regret bounds for adversarial environments as well as poly-logarithmic regret bounds for stochastic environments. As Alon et al. [2015] have shown, tight regret bounds depend on the structure of the feedback graph: strongly observable graphs yield minimax regret of $\tildeΘ( α^{1/2} T^{1/2} )$, while weakly observable graphs induce minimax regret of $\tildeΘ( δ^{1/3} T^{2/3} )$, where $α$ and $δ$, respectively, represent the independence number of the graph and the domination number of a certain portion of the graph. Our proposed algorithm for strongly observable graphs has a regret bound of $\tilde{O}( α^{1/2} T^{1/2} ) $ for adversarial environments, as well as of $ {O} ( \frac{α(\ln T)^3 }{Δ_{\min}} ) $ for stochastic environments, where $Δ_{\min}$ expresses the minimum suboptimality gap. This result resolves an open question raised by Erez and Koren [2021]. We also provide an algorithm for weakly observable graphs that achieves a regret bound of $\tilde{O}( δ^{1/3}T^{2/3} )$ for adversarial environments and poly-logarithmic regret for stochastic environments. The proposed algorithms are based on the follow-the-regularized-leader approach combined with newly designed update rules for learning rates.

下载PDF全文

下载文献需遵守相关版权规定

论文标题