论文标题
基于学习的框架,用于处理多产品库存管理中不确定的交货时间
A Learning Based Framework for Handling Uncertain Lead Times in Multi-Product Inventory Management
论文作者
论文摘要
关于供应链和库存管理的大多数现有文献都考虑了零或恒定交货时间的随机需求过程。虽然在某些利基场景中确实可以忽略交货时间的不确定性,但大多数现实世界中的大多数情况都会在交货时间内表现出随机性。这些随机的波动可能是由于原材料到达制造商尽头的不确定性,运输的延迟,需求的不可预见的激增以及切换到其他供应商的数量而引起的。众所周知,交货时间的随机性会严重降低库存管理系统中的性能,只有通过原则性的方法在供应链系统中删除这一差距才是公平的。由最近引入的延迟分辨深Q学习(DRDQN)算法的动机,本文开发了基于加强学习的范式,用于处理交货时间的不确定性(\ EMPH {Action Delay})。通过经验评估,进一步表明,具有不确定交货时间的库存管理不仅等于跨多个梯队之间的信息共享的延迟(\ emph {观察延迟}),该模型训练以处理一种延迟的模型能够处理另一种延迟的延迟,而无需重新研究另一种延迟。最后,我们将延迟分辨框架应用于包括在交货时间内具有随机性的多种产品的场景,并阐明了延迟分辨框架如何否定任何延迟的效果以实现近乎最佳性能的效果。
Most existing literature on supply chain and inventory management consider stochastic demand processes with zero or constant lead times. While it is true that in certain niche scenarios, uncertainty in lead times can be ignored, most real-world scenarios exhibit stochasticity in lead times. These random fluctuations can be caused due to uncertainty in arrival of raw materials at the manufacturer's end, delay in transportation, an unforeseen surge in demands, and switching to a different vendor, to name a few. Stochasticity in lead times is known to severely degrade the performance in an inventory management system, and it is only fair to abridge this gap in supply chain system through a principled approach. Motivated by the recently introduced delay-resolved deep Q-learning (DRDQN) algorithm, this paper develops a reinforcement learning based paradigm for handling uncertainty in lead times (\emph{action delay}). Through empirical evaluations, it is further shown that the inventory management with uncertain lead times is not only equivalent to that of delay in information sharing across multiple echelons (\emph{observation delay}), a model trained to handle one kind of delay is capable to handle delays of another kind without requiring to be retrained. Finally, we apply the delay-resolved framework to scenarios comprising of multiple products subjected to stochasticity in lead times, and elucidate how the delay-resolved framework negates the effect of any delay to achieve near-optimal performance.