潜在表示预测网络

论文标题

潜在表示预测网络

Latent Representation Prediction Networks

论文作者

Hlynsson, Hlynur Davíð, Schüler, Merlin, Schiewer, Robin, Glasmachers, Tobias, Wiskott, Laurenz

论文摘要

深度学习的计划方法通常基于针对无关任务进行优化的学习表征。例如，它们可能会接受重建环境的培训。然后将这些表示形式与预测函数结合使用，以模拟推出以导航环境。我们发现，学习表征的原则不令人满意，并建议学习它们，以使它们直接针对手头的任务进行优化：对于预测指标，可以最大程度地预测。这导致了设计最佳的表示形式，以实现计划的下游任务，在该任务中，学到的预测变量功能被用作正向模型。为此，我们提出了一种共同学习此表示形式以及预测函数的新方法，该系统我们将潜在表示预测网络（LARP）列为一个系统。预测函数用作在视点匹配任务中在图形上进行搜索的正向模型，并且发现学到的表示可预测性的表示形式胜过预先训练的表示。我们的方法被证明比标准强化学习方法更有效率，并且我们学到的表示形式成功地将其转移到不同的对象。

Deeply-learned planning methods are often based on learning representations that are optimized for unrelated tasks. For example, they might be trained on reconstructing the environment. These representations are then combined with predictor functions for simulating rollouts to navigate the environment. We find this principle of learning representations unsatisfying and propose to learn them such that they are directly optimized for the task at hand: to be maximally predictable for the predictor function. This results in representations that are by design optimal for the downstream task of planning, where the learned predictor function is used as a forward model. To this end, we propose a new way of jointly learning this representation along with the prediction function, a system we dub Latent Representation Prediction Network (LARP). The prediction function is used as a forward model for search on a graph in a viewpoint-matching task and the representation learned to maximize predictability is found to outperform a pre-trained representation. Our approach is shown to be more sample-efficient than standard reinforcement learning methods and our learned representation transfers successfully to dissimilar objects.

下载PDF全文

下载文献需遵守相关版权规定

论文标题