论文标题
评估课程增强学习的进化地形生成方法
Assessing Evolutionary Terrain Generation Methods for Curriculum Reinforcement Learning
论文作者
论文摘要
课程学习允许通过增量进步掌握复杂的任务,而不是“步进石头”目标实现最终所需行为。典型的实现通过通过参数化的噪声函数生成的地形网格逐渐络合了挑战环境的运动策略。迄今为止,研究人员主要从有限的噪声函数中产生了地形,并且在文献中,发电机对学习过程的影响不足。我们将流行的基于噪声的地形发生器与两个间接编码CPPN和GAN进行比较。为了允许直接比较直接表示和间接表示,我们评估了一系列表示形式的Agnostic Map-Elites功能描述符,这些描述符直接从生成的地形网格中计算指标。接下来,使用PPO算法在物理模拟器中训练人形机器人时,评估性能和覆盖范围。结果描述了为其在课程学习中使用的发电机之间的关键差异,并提供了一系列有用的功能描述符,以供社区吸收。
Curriculum learning allows complex tasks to be mastered via incremental progression over `stepping stone' goals towards a final desired behaviour. Typical implementations learn locomotion policies for challenging environments through gradual complexification of a terrain mesh generated through a parameterised noise function. To date, researchers have predominantly generated terrains from a limited range of noise functions, and the effect of the generator on the learning process is underrepresented in the literature. We compare popular noise-based terrain generators to two indirect encodings, CPPN and GAN. To allow direct comparison between both direct and indirect representations, we assess the impact of a range of representation-agnostic MAP-Elites feature descriptors that compute metrics directly from the generated terrain meshes. Next, performance and coverage are assessed when training a humanoid robot in a physics simulator using the PPO algorithm. Results describe key differences between the generators that inform their use in curriculum learning, and present a range of useful feature descriptors for uptake by the community.