论文标题
基于搜索的测试方法,用于深入强化学习者
A Search-Based Testing Approach for Deep Reinforcement Learning Agents
论文作者
论文摘要
在过去的十年中,深入的强化学习(DRL)算法已越来越多地使用,以解决各种决策问题,例如自主驾驶和机器人技术。但是,这些算法在部署在安全至关重要的环境中时面临着巨大的挑战,因为它们经常表现出错误的行为,可能导致潜在的关键错误。评估DRL代理的安全性的一种方法是测试它们,以检测可能导致执行过程中严重失败的故障。这就提出了一个问题,即我们如何有效测试DRL政策以确保其正确性和遵守安全要求。测试DRL代理的大多数现有作品都使用扰动代理的对抗性攻击。但是,这种攻击通常会导致环境不切实际。他们的主要目标是测试DRL代理的鲁棒性,而不是测试代理商对要求的依从性。由于DRL环境的巨大状态空间,测试执行的高成本以及DRL算法的黑盒性质,因此不可能对DRL代理进行详尽的测试。在本文中,我们提出了一种基于搜索的强化学习剂(Starla)的测试方法,以通过有效地在有限的测试预算中寻找代理的执行失败,以测试DRL代理的策略。我们使用机器学习模型和专用的遗传算法来缩小搜索错误的搜索。我们将Starla应用于深Q学习代理,这些Q-Leacherning代理被广泛用作基准,并表明它通过检测到与代理商策略相关的更多故障来大大优于随机测试。我们还研究了如何使用我们的搜索结果提取表征DRL代理的错误发作的规则。这些规则可用于了解代理失败的条件,从而评估其部署风险。
Deep Reinforcement Learning (DRL) algorithms have been increasingly employed during the last decade to solve various decision-making problems such as autonomous driving and robotics. However, these algorithms have faced great challenges when deployed in safety-critical environments since they often exhibit erroneous behaviors that can lead to potentially critical errors. One way to assess the safety of DRL agents is to test them to detect possible faults leading to critical failures during their execution. This raises the question of how we can efficiently test DRL policies to ensure their correctness and adherence to safety requirements. Most existing works on testing DRL agents use adversarial attacks that perturb states or actions of the agent. However, such attacks often lead to unrealistic states of the environment. Their main goal is to test the robustness of DRL agents rather than testing the compliance of agents' policies with respect to requirements. Due to the huge state space of DRL environments, the high cost of test execution, and the black-box nature of DRL algorithms, the exhaustive testing of DRL agents is impossible. In this paper, we propose a Search-based Testing Approach of Reinforcement Learning Agents (STARLA) to test the policy of a DRL agent by effectively searching for failing executions of the agent within a limited testing budget. We use machine learning models and a dedicated genetic algorithm to narrow the search towards faulty episodes. We apply STARLA on Deep-Q-Learning agents which are widely used as benchmarks and show that it significantly outperforms Random Testing by detecting more faults related to the agent's policy. We also investigate how to extract rules that characterize faulty episodes of the DRL agent using our search results. Such rules can be used to understand the conditions under which the agent fails and thus assess its deployment risks.