论文标题
模型应该准确吗?
Should Models Be Accurate?
论文作者
论文摘要
基于模型的强化学习(MBRL)除了从环境中的经验学习外,还可以通过模型生成的经验来计划数据效率。但是,在复杂或不断变化的环境中,MBRL中的模型不可避免地是不完美的,并且它们对学习的有害影响可能很难减轻。在这项工作中,我们质疑这些模型的目标是否应该是对环境动态的准确模拟。在预测环境中,我们将调查重点放在DYNA风格的计划上。首先,我们强调并支持三个激励点:完全准确的环境动力学模型实际上是无法实现的,不是必需的,并且并不总是最有用的。其次,我们引入了一种用于培训模型的元学习算法,重点是他们对学习者的实用性,而不是对环境建模的准确性。我们的实验表明,在简单的非平稳环境中,我们的算法比使用具有针对非平稳性的域知识的精确模型更快。
Model-based Reinforcement Learning (MBRL) holds promise for data-efficiency by planning with model-generated experience in addition to learning with experience from the environment. However, in complex or changing environments, models in MBRL will inevitably be imperfect, and their detrimental effects on learning can be difficult to mitigate. In this work, we question whether the objective of these models should be the accurate simulation of environment dynamics at all. We focus our investigations on Dyna-style planning in a prediction setting. First, we highlight and support three motivating points: a perfectly accurate model of environment dynamics is not practically achievable, is not necessary, and is not always the most useful anyways. Second, we introduce a meta-learning algorithm for training models with a focus on their usefulness to the learner instead of their accuracy in modelling the environment. Our experiments show that in a simple non-stationary environment, our algorithm enables faster learning than even using an accurate model built with domain-specific knowledge of the non-stationarity.