论文标题
动作空间中的探索
Exploration in Action Space
论文作者
论文摘要
最近已证明具有黑框优化的参数空间探索方法在连续控制加强学习域中胜过最先进的方法。在本文中,我们研究了这些方法效果更好的原因以及它们比传统的行动空间探索方法更糟的情况。通过简单的理论分析,我们表明,当解决强化学习问题所需的参数复杂性大于动作空间维度和地平线长度的产物时,则首选动作空间中的探索。通过比较几个玩具问题的简单探索方法,从经验上也表明了这一点。
Parameter space exploration methods with black-box optimization have recently been shown to outperform state-of-the-art approaches in continuous control reinforcement learning domains. In this paper, we examine reasons why these methods work better and the situations in which they are worse than traditional action space exploration methods. Through a simple theoretical analysis, we show that when the parametric complexity required to solve the reinforcement learning problem is greater than the product of action space dimensionality and horizon length, exploration in action space is preferred. This is also shown empirically by comparing simple exploration methods on several toy problems.