移动机器人技术的用户条件神经控制政策

论文标题

移动机器人技术的用户条件神经控制政策

User-Conditioned Neural Control Policies for Mobile Robotics

论文作者

Bauersfeld, Leonard, Kaufmann, Elia, Scaramuzza, Davide

论文摘要

最近，已经证明基于学习的控制器将移动机器人系统推向其极限，并提供许多现实世界应用所需的鲁棒性。但是，只有基于经典优化的控制框架才能通过设置目标速度或执行器限制在执行过程中动态调整固有的灵活性。我们提出了一个框架，以克服神经控制器的缺点，通过将其调节以辅助输入为条件。通过包括特征线性调制层（膜）来启用此进步。我们使用无模型的加固学习学习来训练四极管控制策略，以在最短时间内通过一系列航路点的任务。通过根据下一个航路点的最大可用推力或观看方向对策略进行调节，用户可以调节部署过程中四极管飞行的侵略性。我们在模拟和现实世界实验中证明，单个控制策略可以在机器人的整个性能信封中达到时间最佳的飞行性能，达到高达60 km/h和4.5克加速。在任务执行过程中指导学习控制者的能力超出了敏捷四极管飞行的含义，因为对人类意图进行控制政策有助于将基于学习的系统安全地从定义明确的实验室环境中带出野外。

Recently, learning-based controllers have been shown to push mobile robotic systems to their limits and provide the robustness needed for many real-world applications. However, only classical optimization-based control frameworks offer the inherent flexibility to be dynamically adjusted during execution by, for example, setting target speeds or actuator limits. We present a framework to overcome this shortcoming of neural controllers by conditioning them on an auxiliary input. This advance is enabled by including a feature-wise linear modulation layer (FiLM). We use model-free reinforcement-learning to train quadrotor control policies for the task of navigating through a sequence of waypoints in minimum time. By conditioning the policy on the maximum available thrust or the viewing direction relative to the next waypoint, a user can regulate the aggressiveness of the quadrotor's flight during deployment. We demonstrate in simulation and in real-world experiments that a single control policy can achieve close to time-optimal flight performance across the entire performance envelope of the robot, reaching up to 60 km/h and 4.5g in acceleration. The ability to guide a learned controller during task execution has implications beyond agile quadrotor flight, as conditioning the control policy on human intent helps safely bringing learning based systems out of the well-defined laboratory environment into the wild.

下载PDF全文

下载文献需遵守相关版权规定

论文标题