分布分配参与者批评合奏，以进行不确定性感知连续控制

论文标题

分布分配参与者批评合奏，以进行不确定性感知连续控制

Distributional Actor-Critic Ensemble for Uncertainty-Aware Continuous Control

论文作者

Kanazawa, Takuya, Wang, Haiyan, Gupta, Chetan

论文摘要

不确定性量化是现实世界应用中机器学习的核心挑战之一。在加强学习中，一个代理人面临两种不确定性，称为认识论不确定性和态度不确定性。同时解开和评估这些不确定性，有机会提高代理商的最终表现，加速培训并促进部署后的质量保证。在这项工作中，我们为连续控制任务的不确定性感知强化学习算法扩展了深度确定性策略梯度算法（DDPG）。它利用了认识论的不确定性，以加速探索和学习风险敏感的政策。我们进行数值实验表明，在机器人控制和功率网络优化的基准任务中，DDPG的变体在没有不确定的估计的情况下优于香草DDPG。

Uncertainty quantification is one of the central challenges for machine learning in real-world applications. In reinforcement learning, an agent confronts two kinds of uncertainty, called epistemic uncertainty and aleatoric uncertainty. Disentangling and evaluating these uncertainties simultaneously stands a chance of improving the agent's final performance, accelerating training, and facilitating quality assurance after deployment. In this work, we propose an uncertainty-aware reinforcement learning algorithm for continuous control tasks that extends the Deep Deterministic Policy Gradient algorithm (DDPG). It exploits epistemic uncertainty to accelerate exploration and aleatoric uncertainty to learn a risk-sensitive policy. We conduct numerical experiments showing that our variant of DDPG outperforms vanilla DDPG without uncertainty estimation in benchmark tasks on robotic control and power-grid optimization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题