用于路径规划，网络运输和加固学习的新拍卖算法

论文标题

用于路径规划，网络运输和加固学习的新拍卖算法

New Auction Algorithms for Path Planning, Network Transport, and Reinforcement Learning

论文作者

Bertsekas, Dimitri

论文摘要

我们考虑了路径计划和网络传输中的一些经典优化问题，并引入了新的基于拍卖的算法，以实现其最佳和次优的解决方案。这些算法基于与对象和随之而来的市场平衡的竞争性竞标相关的数学思想，这些算法是拍卖过程的基础。但是，我们的算法的起点是不同的，即有向图中的加权和未加权的路径构造，而不是将人分配给对象。新算法比现有方法具有多种潜在的优势：在某些重要情况下，它们在经验上更快，例如Max-Flow，它们非常适合在线重新载体，并且可以适应分布式的异步操作。此外，它们允许任意初始价格，而无需互补的懈怠限制，因此非常适合利用加强学习方法，这些方法将使用离线培训与数据以及实时操作期间的在线培训一起使用。新算法还可以在涉及近似的增强学习环境中找到使用，例如多步lookahead和树搜索方案和/或推出算法。

We consider some classical optimization problems in path planning and network transport, and we introduce new auction-based algorithms for their optimal and suboptimal solution. The algorithms are based on mathematical ideas that are related to competitive bidding by persons for objects and the attendant market equilibrium, which underlie auction processes. However, the starting point of our algorithms is different, namely weighted and unweighted path construction in directed graphs, rather than assignment of persons to objects. The new algorithms have several potential advantages over existing methods: they are empirically faster in some important contexts, such as max-flow, they are well-suited for on-line replanning, and they can be adapted to distributed asynchronous operation. Moreover, they allow arbitrary initial prices, without complementary slackness restrictions, and thus are better-suited to take advantage of reinforcement learning methods that use off-line training with data, as well as on-line training during real-time operation. The new algorithms may also find use in reinforcement learning contexts involving approximation, such as multistep lookahead and tree search schemes, and/or rollout algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题