论文标题
评估强化学习算法的性能
Evaluating the Performance of Reinforcement Learning Algorithms
论文作者
论文摘要
绩效评估对于量化算法进步至关重要。最近的可重复性分析表明,报告的性能结果通常不一致且难以复制。在这项工作中,我们认为绩效的不一致源于使用有缺陷的评估指标。迈向确保报告结果一致的一步,我们提出了一种新的综合评估方法,用于增强学习算法,该方法在单个环境和跨环境中汇总时会产生可靠的性能测量。我们通过评估标准基准任务上的一系列强化学习算法来证明这种方法。
Performance evaluations are critical for quantifying algorithmic advances in reinforcement learning. Recent reproducibility analyses have shown that reported performance results are often inconsistent and difficult to replicate. In this work, we argue that the inconsistency of performance stems from the use of flawed evaluation metrics. Taking a step towards ensuring that reported results are consistent, we propose a new comprehensive evaluation methodology for reinforcement learning algorithms that produces reliable measurements of performance both on a single environment and when aggregated across environments. We demonstrate this method by evaluating a broad class of reinforcement learning algorithms on standard benchmark tasks.