论文标题
在线分布式职位分发使用过时且部分观察的信息
Online Distributed Job Dispatching with Outdated and Partially-Observable Information
论文作者
论文摘要
在本文中,我们调查了位于大都市区域(MAN)的边缘计算系统中的在线分布式工作调度。具体而言,在接入点(AP)上实现了求职者,这些访问点(AP)从移动用户收集作业并将每个作业分配到边缘或云的服务器。引入了带有定期广播的信号传导机制,以促进AP之间的合作。在人类中,传输延迟不可忽略,这导致AP之间的过时信息共享。此外,由于所有广播的接收是耗时的,因此不建议使用完整的系统状态。因此,我们将AP之间的作业派遣策略的分布式优化作为马尔可夫决策过程,具有部分和过时的系统状态,即部分可观察到的马尔可夫决策过程(POMDP)。由于时间复杂性很高,POMDP的常规解决方案是不切实际的。我们提出了一个新型的低复杂解决方案框架,用于分布式工作调度,基于该策略的优化,可以通过替代政策迭代算法解耦,以便可以根据部分和过时的观察来制定每个AP的分布式策略迭代。对于我们的近似MDP解决方案证明了理论性能下限。此外,我们基于Google群集跟踪进行了广泛的模拟。评估结果表明,与启发式基准相比,平均工作响应时间降低了我们的政策高达20.67美元\%$,并且我们的算法在各种参数设置下始终如一。
In this paper, we investigate online distributed job dispatching in an edge computing system residing in a Metropolitan Area Network (MAN). Specifically, job dispatchers are implemented on access points (APs) which collect jobs from mobile users and distribute each job to a server at the edge or the cloud. A signaling mechanism with periodic broadcast is introduced to facilitate cooperation among APs. The transmission latency is non-negligible in MAN, which leads to outdated information sharing among APs. Moreover, the fully-observed system state is discouraged as reception of all broadcast is time consuming. Therefore, we formulate the distributed optimization of job dispatching strategies among the APs as a Markov decision process with partial and outdated system state, i.e., partially observable Markov Decision Process (POMDP). The conventional solution for POMDP is impractical due to huge time complexity. We propose a novel low-complexity solution framework for distributed job dispatching, based on which the optimization of job dispatching policy can be decoupled via an alternative policy iteration algorithm, so that the distributed policy iteration of each AP can be made according to partial and outdated observation. A theoretical performance lower bound is proved for our approximate MDP solution. Furthermore, we conduct extensive simulations based on the Google Cluster trace. The evaluation results show that our policy can achieve as high as $20.67\%$ reduction in average job response time compared with heuristic baselines, and our algorithm consistently performs well under various parameter settings.