反应性探索以应对终生增强学习中的非平稳性

论文标题

反应性探索以应对终生增强学习中的非平稳性

Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning

论文作者

Steinparz, Christian, Schmied, Thomas, Paischer, Fabian, Dinu, Marius-Constantin, Patil, Vihang, Bitto-Nemling, Angela, Eghbal-zadeh, Hamid, Hochreiter, Sepp

论文摘要

在终生学习中，代理在整个生命中都在不重复的一生中学习，就像人类一样，在不断变化的环境中。因此，终身学习带来了许多研究问题，例如持续领域的变化，这导致了非平稳的奖励和环境动态。由于其连续的性质，这些非平稳性很难检测和应对。因此，需要探索策略和学习方法，能够跟踪稳定的域移动并适应它们。我们提出反应性探索，以跟踪和反应终生增强学习中持续的域变化，并相应地更新策略。为此，我们进行实验以研究不同的勘探策略。我们从经验上表明，政策阶级家族的代表更适合终身学习，因为它们比Q学习更快地适应了分销的变化。因此，从反应性探索中获利的政策梯度方法最大，并在终身学习中表现出良好的结果，并持续域名。我们的代码可在以下网址提供：https：//github.com/ml-jku/reactive-ecploration。

In lifelong learning, an agent learns throughout its entire life without resets, in a constantly changing environment, as we humans do. Consequently, lifelong learning comes with a plethora of research problems such as continual domain shifts, which result in non-stationary rewards and environment dynamics. These non-stationarities are difficult to detect and cope with due to their continuous nature. Therefore, exploration strategies and learning methods are required that are capable of tracking the steady domain shifts, and adapting to them. We propose Reactive Exploration to track and react to continual domain shifts in lifelong reinforcement learning, and to update the policy correspondingly. To this end, we conduct experiments in order to investigate different exploration strategies. We empirically show that representatives of the policy-gradient family are better suited for lifelong learning, as they adapt more quickly to distribution shifts than Q-learning. Thereby, policy-gradient methods profit the most from Reactive Exploration and show good results in lifelong learning with continual domain shifts. Our code is available at: https://github.com/ml-jku/reactive-exploration.

下载PDF全文

下载文献需遵守相关版权规定

论文标题