论文标题
K-均值最大熵探索
k-Means Maximum Entropy Exploration
论文作者
论文摘要
在稀疏奖励的高维,连续空间中的探索是增强学习的一个开放问题。人工好奇心算法通过创造导致探索的奖励来解决这一问题。考虑到能够最大化奖励的强化学习算法,该问题减少了与探索一致的优化目标。最大的熵探索将国家访问分布的熵作为一个目标。但是,在高维,连续的空间中,有效估计国家探视分布的熵是具有挑战性的。我们引入了一种基于下边界的人造好奇算法,该算法是对国家访问分布的熵的近似值。结合依赖于我们证明使用k均值的任意维度的非参数密度估计的结果。我们表明,我们的方法既是计算上有效的,又是在高维,连续空间探索的基准上具有竞争力,尤其是在强化学习算法无法找到奖励的任务上。
Exploration in high-dimensional, continuous spaces with sparse rewards is an open problem in reinforcement learning. Artificial curiosity algorithms address this by creating rewards that lead to exploration. Given a reinforcement learning algorithm capable of maximizing rewards, the problem reduces to finding an optimization objective consistent with exploration. Maximum entropy exploration uses the entropy of the state visitation distribution as such an objective. However, efficiently estimating the entropy of the state visitation distribution is challenging in high-dimensional, continuous spaces. We introduce an artificial curiosity algorithm based on lower bounding an approximation to the entropy of the state visitation distribution. The bound relies on a result we prove for non-parametric density estimation in arbitrary dimensions using k-means. We show that our approach is both computationally efficient and competitive on benchmarks for exploration in high-dimensional, continuous spaces, especially on tasks where reinforcement learning algorithms are unable to find rewards.