论文标题
分发分类的政策熵
Policy Entropy for Out-of-Distribution Classification
论文作者
论文摘要
在现实世界中部署强化学习系统的一个关键先决条件是能够可靠地检测未经训练的代理商的情况。当错误的预测导致执行有害行动时,这种情况可能会导致潜在的安全风险。在这项工作中,我们提出了PEOC,PEOC是一种新的基于分布式分布的新政策熵,可靠地检测到深度强化学习中未遇到的状态。它基于使用代理策略的熵作为单级分类器的分类分数。我们使用程序环境生成器评估我们的方法。结果表明,PEOC与评估环境的最新一级分类算法具有很高的竞争力。此外,我们提出了一个结构化的过程,用于基于强化学习中的分布分类分类。
One critical prerequisite for the deployment of reinforcement learning systems in the real world is the ability to reliably detect situations on which the agent was not trained. Such situations could lead to potential safety risks when wrong predictions lead to the execution of harmful actions. In this work, we propose PEOC, a new policy entropy based out-of-distribution classifier that reliably detects unencountered states in deep reinforcement learning. It is based on using the entropy of an agent's policy as the classification score of a one-class classifier. We evaluate our approach using a procedural environment generator. Results show that PEOC is highly competitive against state-of-the-art one-class classification algorithms on the evaluated environments. Furthermore, we present a structured process for benchmarking out-of-distribution classification in reinforcement learning.