用于动态DAG调度的几何深钢筋学习

论文标题

用于动态DAG调度的几何深钢筋学习

Geometric Deep Reinforcement Learning for Dynamic DAG Scheduling

论文作者

Grinsztajn, Nathan, Beaumont, Olivier, Jeannot, Emmanuel, Preux, Philippe

论文摘要

在实践中，面对包含不确定性以及非确定性和动态性的组合优化问题是很常见的。这三个属性要求适当的算法；强化学习（RL）正在以非常自然的方式与他们打交道。如今，尽管做了一些努力，但大多数现实生活中的组合优化问题仍然远远超出了增强学习算法的影响。在本文中，我们提出了一种加强学习方法来解决现实的调度问题，并将其应用于高性能计算社区中通常执行的算法，即chelesky分解。与静态调度相反，在并行执行开始之前，将任务分配给预定订单中的处理器，我们的方法是动态的：任务分配及其执行订购是根据系统状态和意外事件在运行时确定的，这允许更具灵活性。为此，我们的算法将图形神经网络与参与者批评算法（A2C）结合使用，以构建对问题的自适应表示。我们表明，这种方法与高性能计算运行时系统中使用的最新启发式方法具有竞争力。此外，我们的算法不需要明确的环境模型，但是我们证明可以轻松地合并额外的知识并改善性能。我们还展示了这种RL方法提供的关键特性，并研究了其转移能力到其他实例。

In practice, it is quite common to face combinatorial optimization problems which contain uncertainty along with non-determinism and dynamicity. These three properties call for appropriate algorithms; reinforcement learning (RL) is dealing with them in a very natural way. Today, despite some efforts, most real-life combinatorial optimization problems remain out of the reach of reinforcement learning algorithms. In this paper, we propose a reinforcement learning approach to solve a realistic scheduling problem, and apply it to an algorithm commonly executed in the high performance computing community, the Cholesky factorization. On the contrary to static scheduling, where tasks are assigned to processors in a predetermined ordering before the beginning of the parallel execution, our method is dynamic: task allocations and their execution ordering are decided at runtime, based on the system state and unexpected events, which allows much more flexibility. To do so, our algorithm uses graph neural networks in combination with an actor-critic algorithm (A2C) to build an adaptive representation of the problem on the fly. We show that this approach is competitive with state-of-the-art heuristics used in high-performance computing runtime systems. Moreover, our algorithm does not require an explicit model of the environment, but we demonstrate that extra knowledge can easily be incorporated and improves performance. We also exhibit key properties provided by this RL approach, and study its transfer abilities to other instances.

下载PDF全文

下载文献需遵守相关版权规定

论文标题