学习通用策略自动机，用于关系随机最短路径问题

论文标题

学习通用策略自动机，用于关系随机最短路径问题

Learning Generalized Policy Automata for Relational Stochastic Shortest Path Problems

论文作者

Karia, Rushang, Nayyar, Rashmeet Kaur, Srivastava, Siddharth

论文摘要

现实世界中的几个面向目标的问题自然可以称为随机路径问题（SSP）。但是，求解SSP的计算复杂性使得找到解决方案甚至是适度的问题。目前，现有的最先进的计划者和启发式方法通常无法利用从解决其他实例中学到的知识。本文提出了一种学习\ emph {广义策略自动机}（GPA）的方法：可用于催化解决方案过程的非确定性部分政策。使用基于关系的摘要学习GPA，这使它们适用于具有不同对象名称和数量的广泛相关问题。对该方法的理论分析表明，它保证了完整性和层次最优性。经验分析表明，这种方法有效地以几种方式学习了广泛适用的政策知识，并且在测试问题上的最先进的SSP求解器的测试问题远远超过了对象计数远大于培训中使用的问题。

Several goal-oriented problems in the real-world can be naturally expressed as Stochastic Shortest Path Problems (SSPs). However, the computational complexity of solving SSPs makes finding solutions to even moderately sized problems intractable. Currently, existing state-of-the-art planners and heuristics often fail to exploit knowledge learned from solving other instances. This paper presents an approach for learning \emph{Generalized Policy Automata} (GPA): non-deterministic partial policies that can be used to catalyze the solution process. GPAs are learned using relational, feature-based abstractions, which makes them applicable on broad classes of related problems with different object names and quantities. Theoretical analysis of this approach shows that it guarantees completeness and hierarchical optimality. Empirical analysis shows that this approach effectively learns broadly applicable policy knowledge in a few-shot fashion and significantly outperforms state-of-the-art SSP solvers on test problems whose object counts are far greater than those used during training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题