论文标题
深层库存管理
Deep Inventory Management
论文作者
论文摘要
这项工作提供了一种深厚的加强学习方法,可以通过随机供应商的交货时间,销售损失,相关需求和价格匹配来解决定期审查库存控制系统。尽管这种动态计划在历史上被认为是棘手的,但我们的结果表明,几种政策学习方法与经典方法具有竞争力或优于经典方法。为了训练这些算法,我们开发了新技术以将历史数据转换为模拟器。从理论方面来说,我们对库存控制问题的子类提出了可学习性结果,在该子类中,我们可以证明将强化学习问题可用于监督学习。在算法方面,我们提出了一个基于模型的增强学习程序(直接反向),以通过构造可区分的模拟器来解决定期审查库存控制问题。在各种指标下,直接反向Propop在模拟和现实世界部署中都超过了无模型的RL和Newsvendor Baselines。
This work provides a Deep Reinforcement Learning approach to solving a periodic review inventory control system with stochastic vendor lead times, lost sales, correlated demand, and price matching. While this dynamic program has historically been considered intractable, our results show that several policy learning approaches are competitive with or outperform classical methods. In order to train these algorithms, we develop novel techniques to convert historical data into a simulator. On the theoretical side, we present learnability results on a subclass of inventory control problems, where we provide a provable reduction of the reinforcement learning problem to that of supervised learning. On the algorithmic side, we present a model-based reinforcement learning procedure (Direct Backprop) to solve the periodic review inventory control problem by constructing a differentiable simulator. Under a variety of metrics Direct Backprop outperforms model-free RL and newsvendor baselines, in both simulations and real-world deployments.