具有奖励作为输入的可配置代理：播放风格的连续体生成

论文标题

具有奖励作为输入的可配置代理：播放风格的连续体生成

Configurable Agent With Reward As Input: A Play-Style Continuum Generation

论文作者

de Woillemont, Pierre Le Pelletier, Labory, Rémi, Corruble, Vincent

论文摘要

就游戏机制而言，现代视频游戏变得越来越丰富，越来越复杂。这种复杂性允许出现各种各样的方式来在玩家中玩游戏。从游戏设计师的角度来看，这意味着人们需要预测可以玩游戏的许多不同方式。机器学习（ML）可以帮助解决此问题。更确切地说，增强学习是对自动化视频游戏测试的必要性的有希望的答案。在本文中，我们提出了一个视频游戏环境，该环境使我们可以定义多种游戏风格。然后，我们介绍Cari：具有奖励作为输入的可配置代理。能够模拟各种游戏风格的代理商。它不受使用奖励成型的当前方法之类的极端原型行为的约束。此外，它通过单个训练循环来实现这一目标，而不是每场比赛风格的通常一个循环。我们将这种新颖的训练方法与更经典的奖励成型方法进行了比较，并得出结论，Cari还可以优于原型生成的基线。这个新颖的代理商可用于调查在制作具有逼真培训时间的视频游戏过程中的行为和平衡。

Modern video games are becoming richer and more complex in terms of game mechanics. This complexity allows for the emergence of a wide variety of ways to play the game across the players. From the point of view of the game designer, this means that one needs to anticipate a lot of different ways the game could be played. Machine Learning (ML) could help address this issue. More precisely, Reinforcement Learning is a promising answer to the need of automating video game testing. In this paper we present a video game environment which lets us define multiple play-styles. We then introduce CARI: a Configurable Agent with Reward as Input. An agent able to simulate a wide continuum range of play-styles. It is not constrained to extreme archetypal behaviors like current methods using reward shaping. In addition it achieves this through a single training loop, instead of the usual one loop per play-style. We compare this novel training approach with the more classic reward shaping approach and conclude that CARI can also outperform the baseline on archetypes generation. This novel agent could be used to investigate behaviors and balancing during the production of a video game with a realistic amount of training time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题