研究学到的动态模型中的复合预测错误

论文标题

研究学到的动态模型中的复合预测错误

Investigating Compounding Prediction Errors in Learned Dynamics Models

论文作者

Lambert, Nathan, Pister, Kristofer, Calandra, Roberto

论文摘要

准确地预测代理行为的后果是计划机器人控制中的关键先决条件。基于模型的强化学习（MBRL）是一种范式，它依赖于对国家行动过渡的迭代学习和预测来解决任务。 Deep MBRL已成为流行的候选人，使用神经网络来学习一个动态模型，该模型可以通过从高维状态到动作进行每次通过。已知这些“一步”预测在组成预测的更长范围内变得不准确 - 称为复合误差问题。鉴于MBRL中复合误差问题的流行率和数据驱动控制的相关字段，我们着手了解导致这些长马错误的属性和条件的特性。在本文中，我们探讨了控制问题的子组件对长期预测错误的影响：包括选择系统，收集数据和培训模型。这些对模拟和现实世界数据的详细定量研究表明，系统的基本动力学是确定预测误差形状和幅度的最强因素。鉴于对复合预测错误的更清晰了解，研究人员可以实施超越“一步”的新型模型，这些模型对控制更有用。

Accurately predicting the consequences of agents' actions is a key prerequisite for planning in robotic control. Model-based reinforcement learning (MBRL) is one paradigm which relies on the iterative learning and prediction of state-action transitions to solve a task. Deep MBRL has become a popular candidate, using a neural network to learn a dynamics model that predicts with each pass from high-dimensional states to actions. These "one-step" predictions are known to become inaccurate over longer horizons of composed prediction - called the compounding error problem. Given the prevalence of the compounding error problem in MBRL and related fields of data-driven control, we set out to understand the properties of and conditions causing these long-horizon errors. In this paper, we explore the effects of subcomponents of a control problem on long term prediction error: including choosing a system, collecting data, and training a model. These detailed quantitative studies on simulated and real-world data show that the underlying dynamics of a system are the strongest factor determining the shape and magnitude of prediction error. Given a clearer understanding of compounding prediction error, researchers can implement new types of models beyond "one-step" that are more useful for control.

下载PDF全文

下载文献需遵守相关版权规定

论文标题