了解在古典计划中学习启发式功能的样本生成策略

论文标题

了解在古典计划中学习启发式功能的样本生成策略

Understanding Sample Generation Strategies for Learning Heuristic Functions in Classical Planning

论文作者

Bettker, R. V., Minini, P. P., Pereira, A. G., Ritt, M.

论文摘要

我们研究了基于由州代表的样本的神经网络来学习良好的启发式功能的问题，该问题具有成本到目标的估计。在状态空间和目标状态下，学习启发式功能，其样品数量仅限于状态空间的一部分，并且必须对具有相同目标条件的状态空间的所有状态进行良好的推广。我们的主要目标是更好地了解样本生成策略对以学习的启发式功能为指导的贪婪最佳启发式搜索（GBFS）表现的影响。在一组受控的实验中，我们发现两个主要因素决定了学习启发式的质量：用于生成样品集的算法以及样本估计与完美成本实现目标的距离。这两个因素取决于：如果样本在整个状态空间中分布得不好，则具有完美的成本估计值是不够的。我们还研究了其他效果，例如添加具有高价值估计的样品。根据我们的发现，我们提出了实用策略来提高学习启发式方法的质量：三种旨在产生更多代表性国家和两种策略的策略，以改善成本到目标的估计。我们的实践策略导致了一种学识渊博的启发式方法，在指导GBFS算法时，与基线学到的启发式方法相比，平均覆盖率的增加了30％以上。

We study the problem of learning good heuristic functions for classical planning tasks with neural networks based on samples represented by states with their cost-to-goal estimates. The heuristic function is learned for a state space and goal condition with the number of samples limited to a fraction of the size of the state space, and must generalize well for all states of the state space with the same goal condition. Our main goal is to better understand the influence of sample generation strategies on the performance of a greedy best-first heuristic search (GBFS) guided by a learned heuristic function. In a set of controlled experiments, we find that two main factors determine the quality of the learned heuristic: the algorithm used to generate the sample set and how close the sample estimates to the perfect cost-to-goal are. These two factors are dependent: having perfect cost-to-goal estimates is insufficient if the samples are not well distributed across the state space. We also study other effects, such as adding samples with high-value estimates. Based on our findings, we propose practical strategies to improve the quality of learned heuristics: three strategies that aim to generate more representative states and two strategies that improve the cost-to-goal estimates. Our practical strategies result in a learned heuristic that, when guiding a GBFS algorithm, increases by more than 30% the mean coverage compared to a baseline learned heuristic.

下载PDF全文

下载文献需遵守相关版权规定

论文标题