论文标题
顺序决策的效用理论
Utility Theory for Sequential Decision Making
论文作者
论文摘要
von Neumann-Morgenstern(VNM)实用程序定理表明,在某些合理性的公理下,决策将减少以最大程度地提高某些效用函数的期望。我们将这些公理扩展到越来越结构化的顺序决策设置,并确定相应的实用程序函数的结构。特别是,我们表明,无内存的偏好会导致以每次过渡奖励的形式和未来回报的乘法因素的形式。该结果激发了马尔可夫决策过程(MDP)的概括,并在代理的返回上使用此结构,我们称之为Affine-Reward-Reward MDP。需要对偏好的更强大的约束来恢复MDP中常用的标量奖励总和。尚未更强的约束简化了寻求目标代理的效用功能,以我们称之为潜在功能的状态的某些函数的差异。我们的必要条件揭示了奖励假设,即通过在VNM理性公理中添加公理并激发了涉及顺序决策的AI研究的新方向,从而使理性代理在增强学习中的设计构成了奖励假设。
The von Neumann-Morgenstern (VNM) utility theorem shows that under certain axioms of rationality, decision-making is reduced to maximizing the expectation of some utility function. We extend these axioms to increasingly structured sequential decision making settings and identify the structure of the corresponding utility functions. In particular, we show that memoryless preferences lead to a utility in the form of a per transition reward and multiplicative factor on the future return. This result motivates a generalization of Markov Decision Processes (MDPs) with this structure on the agent's returns, which we call Affine-Reward MDPs. A stronger constraint on preferences is needed to recover the commonly used cumulative sum of scalar rewards in MDPs. A yet stronger constraint simplifies the utility function for goal-seeking agents in the form of a difference in some function of states that we call potential functions. Our necessary and sufficient conditions demystify the reward hypothesis that underlies the design of rational agents in reinforcement learning by adding an axiom to the VNM rationality axioms and motivates new directions for AI research involving sequential decision making.