学会赚取：在乘车舰队中启用协调

论文标题

学会赚取：在乘车舰队中启用协调

Learn to Earn: Enabling Coordination within a Ride Hailing Fleet

论文作者

Chaudhari, Harshal A., Byers, John W., Terzi, Evimaria

论文摘要

由于驾驶员，乘客和平台本身之间的目标差异，在Uber，Lyft等多方面优化社会福利目标的问题是具有挑战性的。理想的解决方案旨在最大程度地减少每个当地乘客乘车请求的响应时间，同时在整个城市中保持高需求满意度和供应利用。经济学家倾向于依靠动态定价机制来扼杀价格敏感的过剩需求并解决特定社区中出现的供应需求不平衡。相比之下，计算机科学家主要将其视为需求预测问题，目的是使用黑匣子协调的多代理深度强化学习方法来先发出向此类社区的供应。在这里，我们通过在特定位置和时间建立驾驶员之间的协调需求，在现有的供应重新定位方法中介绍解释性。基于显式需求的协调允许我们的框架使用基于更简单的非强化学习方法，从而使其能够解释其建议。此外，它提供了嫉妒的免费建议，即在同一地点和时间的驾驶员不会羡慕彼此的未来收入。我们的实验评估证明了我们框架的有效性，鲁棒性和普遍性。最后，与以前的作品相反，我们为我们的工作的终点重复性提供了一个强化学习环境，并鼓励将来的比较研究。

The problem of optimizing social welfare objectives on multi sided ride hailing platforms such as Uber, Lyft, etc., is challenging, due to misalignment of objectives between drivers, passengers, and the platform itself. An ideal solution aims to minimize the response time for each hyper local passenger ride request, while simultaneously maintaining high demand satisfaction and supply utilization across the entire city. Economists tend to rely on dynamic pricing mechanisms that stifle price sensitive excess demand and resolve the supply demand imbalances emerging in specific neighborhoods. In contrast, computer scientists primarily view it as a demand prediction problem with the goal of preemptively repositioning supply to such neighborhoods using black box coordinated multi agent deep reinforcement learning based approaches. Here, we introduce explainability in the existing supply repositioning approaches by establishing the need for coordination between the drivers at specific locations and times. Explicit need based coordination allows our framework to use a simpler non deep reinforcement learning based approach, thereby enabling it to explain its recommendations ex post. Moreover, it provides envy free recommendations i.e., drivers at the same location and time do not envy one another's future earnings. Our experimental evaluation demonstrates the effectiveness, the robustness, and the generalizability of our framework. Finally, in contrast to previous works, we make available a reinforcement learning environment for end to end reproducibility of our work and to encourage future comparative studies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题