论文标题
关于布雷格曼分歧和价值之间的连接,马尔可夫决策过程
On the connection between Bregman divergence and value in regularized Markov decision processes
论文作者
论文摘要
在此简短的说明中,我们得出了布雷格曼(Bregman)从当前策略到最佳策略的差异与在正规化马尔可夫决策过程中当前价值函数的次优。该结果对多任务增强学习,离线增强学习和功能近似中的分析有影响。
In this short note we derive a relationship between the Bregman divergence from the current policy to the optimal policy and the suboptimality of the current value function in a regularized Markov decision process. This result has implications for multi-task reinforcement learning, offline reinforcement learning, and regret analysis under function approximation, among others.