论文标题
随机最短的路径,成本不断变化
Stochastic Shortest Path with Adversarially Changing Costs
论文作者
论文摘要
随机最短路径(SSP)是计划和控制方面的一个众所周知的问题,在该计划和控制中,代理必须达到目标状态,以最低的总预期成本达到目标状态。在本文中,我们介绍了对抗性SSP模型,该模型还考虑了随着时间的推移成本的对抗性变化,而基础过渡功能保持不变。正式地,代理与$ K $插入的SSP环境相互作用,成本函数在情节之间任意变化,并且代理商未知过渡。我们为对抗性SSP开发了第一个算法,并证明了$ \ widetilde o(\ sqrt {k})$的高概率后悔界限,假设所有成本都严格为正,而$ \ wideTilde o(k^{3/4})$在总体情况下。我们是第一个考虑这种自然环境的对抗性SSP并为此获得次线性遗憾的人。
Stochastic shortest path (SSP) is a well-known problem in planning and control, in which an agent has to reach a goal state in minimum total expected cost. In this paper we present the adversarial SSP model that also accounts for adversarial changes in the costs over time, while the underlying transition function remains unchanged. Formally, an agent interacts with an SSP environment for $K$ episodes, the cost function changes arbitrarily between episodes, and the transitions are unknown to the agent. We develop the first algorithms for adversarial SSPs and prove high probability regret bounds of $\widetilde O (\sqrt{K})$ assuming all costs are strictly positive, and $\widetilde O (K^{3/4})$ in the general case. We are the first to consider this natural setting of adversarial SSP and obtain sub-linear regret for it.