论文标题

在复杂的环境中避免副作用

Avoiding Side Effects in Complex Environments

论文作者

Turner, Alexander Matt, Ratzlaff, Neale, Tadepalli, Prasad

论文摘要

奖励功能规范可能很困难。奖励代理商制作小部件可能很容易,但是惩罚许多可能的负面影响很难。在玩具环境中,可实现的实用程序保存(AUP)通过惩罚实现随机产生目标的能力的转变来避免副作用。我们根据康威的生活游戏将这种方法扩展到大型,随机生成的环境。通过保留单个随机生成的奖励功能的最佳价值,AUP在带领代理人完成指定任务并避免许多副作用的同时会产生适度的开销。视频和代码可在https://avoiding-side-effects.github.io/上找到。

Reward function specification can be difficult. Rewarding the agent for making a widget may be easy, but penalizing the multitude of possible negative side effects is hard. In toy environments, Attainable Utility Preservation (AUP) avoided side effects by penalizing shifts in the ability to achieve randomly generated goals. We scale this approach to large, randomly generated environments based on Conway's Game of Life. By preserving optimal value for a single randomly generated reward function, AUP incurs modest overhead while leading the agent to complete the specified task and avoid many side effects. Videos and code are available at https://avoiding-side-effects.github.io/.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源