众包城市交付的深入强化学习：系统状态表征，启发式指导的行动选择和规则互化的整合

论文标题

众包城市交付的深入强化学习：系统状态表征，启发式指导的行动选择和规则互化的整合

Deep Reinforcement Learning for Crowdsourced Urban Delivery: System States Characterization, Heuristics-guided Action Choice, and Rule-Interposing Integration

论文作者

Ahamed, Tanvir, Zou, Bo, Farazi, Nahid Parvez, Tulabandhula, Theja

论文摘要

本文调查了在众包城市交付的背景下向临时快递员分配运输请求的问题。运输请求是在最早的取货时间到最新交货时间之间的空间分配的，每个窗口的时间限制有限。临时快递员称为众包，时间的可用性和携带能力也有限。我们提出了一种新的深入增强学习（DRL）的方法来解决此任务问题。训练了深层Q网络（DQN）算法，这需要实验重播和目标网络的两个显着特征，从而提高了DRL培训的效率，收敛性和稳定性。更重要的是，本文做出了三种方法论贡献：1）对众策系统状态的全面和新颖的描述，其中包含众包和请求的时空和能力信息； 2）嵌入启发式方法，以利用国家代表性提供的信息，并基于直观的推理来指导采取特定行动，以保持障碍性并提高培训效率； 3）整合规则插值，以防止在路由改进过程中反复访问相同的路线和节点序列，从而通过加速学习进一步提高训练效率。通过广泛的数值分析证明了拟议方法的有效性。结果表明，在DRL培训中，启发式指导的动作选择和规则中断带来的好处，以及拟议方法在解决方案质量，时间和可扩展性方面所提出的方法优于现有启发式方法。除了提高人群策划计划效率的潜力外，该方法还为车辆路由环境中的其他问题提供了新的途径和通用框架。

This paper investigates the problem of assigning shipping requests to ad hoc couriers in the context of crowdsourced urban delivery. The shipping requests are spatially distributed each with a limited time window between the earliest time for pickup and latest time for delivery. The ad hoc couriers, termed crowdsourcees, also have limited time availability and carrying capacity. We propose a new deep reinforcement learning (DRL)-based approach to tackling this assignment problem. A deep Q network (DQN) algorithm is trained which entails two salient features of experience replay and target network that enhance the efficiency, convergence, and stability of DRL training. More importantly, this paper makes three methodological contributions: 1) presenting a comprehensive and novel characterization of crowdshipping system states that encompasses spatial-temporal and capacity information of crowdsourcees and requests; 2) embedding heuristics that leverage the information offered by the state representation and are based on intuitive reasoning to guide specific actions to take, to preserve tractability and enhance efficiency of training; and 3) integrating rule-interposing to prevent repeated visiting of the same routes and node sequences during routing improvement, thereby further enhancing the training efficiency by accelerating learning. The effectiveness of the proposed approach is demonstrated through extensive numerical analysis. The results show the benefits brought by the heuristics-guided action choice and rule-interposing in DRL training, and the superiority of the proposed approach over existing heuristics in both solution quality, time, and scalability. Besides the potential to improve the efficiency of crowdshipping operation planning, the proposed approach also provides a new avenue and generic framework for other problems in the vehicle routing context.

下载PDF全文

下载文献需遵守相关版权规定

论文标题