使用Go-explore无域启发式的自适应应力测试

论文标题

使用Go-explore无域启发式的自适应应力测试

Adaptive Stress Testing without Domain Heuristics using Go-Explore

论文作者

Koren, Mark, Kochenderfer, Mykel J.

论文摘要

最近，增强学习（RL）已被用作在自主系统中查找故障的工具。在执行过程中，RL代理通常依靠某些特定领域的启发式奖励来指导他们寻找失败，但是构建这样的启发式可能是困难或不可行的。如果没有启发式，代理人只能在失败时获得奖励，甚至可以将其引导远离失败的奖励。例如，某些方法为采取更明显的动作提供了回报，因为我们希望找到更明显的失败。但是，代理商可能会学会仅采取可能的行动，并且可能无法找到失败。因此，这个问题成为一个艰难的探索问题，奖励无助于探索。一种新的算法，即Go-explore（GE），最近在硬探索字段的基准上创建了新的记录。我们将GE应用于自适应压力测试（AST），这是基于RL的伪造方法的一个例子，该方法提供了一种搜索最明显的故障情况的方法。我们模拟了一个自动驾驶车辆在行人穿越道路时开车的情况。我们证明，GE能够在其他RL技术无法解决的情况下找到没有特定领域的启发式方法的失败，例如汽车和行人之间的距离。此外，受GE的鲁棒化阶段的启发，我们证明了后退算法（BA）改善了其他RL技术发现的故障。

Recently, reinforcement learning (RL) has been used as a tool for finding failures in autonomous systems. During execution, the RL agents often rely on some domain-specific heuristic reward to guide them towards finding failures, but constructing such a heuristic may be difficult or infeasible. Without a heuristic, the agent may only receive rewards at the time of failure, or even rewards that guide it away from failures. For example, some approaches give rewards for taking more-likely actions, because we want to find more-likely failures. However, the agent may then learn to only take likely actions, and may not be able to find a failure at all. Consequently, the problem becomes a hard-exploration problem, where rewards do not aid exploration. A new algorithm, go-explore (GE), has recently set new records on benchmarks from the hard-exploration field. We apply GE to adaptive stress testing (AST), one example of an RL-based falsification approach that provides a way to search for the most-likely failure scenario. We simulate a scenario where an autonomous vehicle drives while a pedestrian is crossing the road. We demonstrate that GE is able to find failures without domain-specific heuristics, such as the distance between the car and the pedestrian, on scenarios that other RL techniques are unable to solve. Furthermore, inspired by the robustification phase of GE, we demonstrate that the backwards algorithm (BA) improves the failures found by other RL techniques.

下载PDF全文

下载文献需遵守相关版权规定

论文标题