专家的嵌套混合物：混合动力学系统的合作和竞争性学习

论文标题

专家的嵌套混合物：混合动力学系统的合作和竞争性学习

Nested Mixture of Experts: Cooperative and Competitive Learning of Hybrid Dynamical System

论文作者

Ahn, Junhyeok, Sentis, Luis

论文摘要

基于模型的增强学习（MBRL）算法可以达到显着的样本效率，但需要适当的网络结构来表示系统动态。当前的方法包括使用分析参数化和使用深神经网络的黑框建模的白框建模。但是，两者在学习过程中都可能遭受偏见的差异权衡，并且两者都不提供将域知识注入网络的结构化方法。作为替代方案，灰色框建模利用神经网络培训的先验知识，但仅用于简单系统。在本文中，我们设计了代表和学习混合动力学系统的专家（NMOE）的嵌套混合物。 NMOE在优化偏差变化权衡的同时，将白色框和黑盒模型都结合在一起。此外，NMOE提供了一种结构化方法，可通过合作或竞争性培训协会专家来纳入各种类型的先验知识。先验知识包括有关机器人与环境的物理接触的信息以及它们的运动学和动态特性。在本文中，我们演示了如何在包括混合动力学系统在内的各种连续控制域中将先验知识纳入我们的NMOE。我们还在数据效率，对数据的概括以及偏见 - 变化权衡方面展示了我们方法的有效性。最后，我们使用MBRL设置评估了NMOE，该设置将模型与基于模型的控制器集成并在线培训。

Model-based reinforcement learning (MBRL) algorithms can attain significant sample efficiency but require an appropriate network structure to represent system dynamics. Current approaches include white-box modeling using analytic parameterizations and black-box modeling using deep neural networks. However, both can suffer from a bias-variance trade-off in the learning process, and neither provides a structured method for injecting domain knowledge into the network. As an alternative, gray-box modeling leverages prior knowledge in neural network training but only for simple systems. In this paper, we devise a nested mixture of experts (NMOE) for representing and learning hybrid dynamical systems. An NMOE combines both white-box and black-box models while optimizing bias-variance trade-off. Moreover, an NMOE provides a structured method for incorporating various types of prior knowledge by training the associative experts cooperatively or competitively. The prior knowledge includes information on robots' physical contacts with the environments as well as their kinematic and dynamic properties. In this paper, we demonstrate how to incorporate prior knowledge into our NMOE in various continuous control domains, including hybrid dynamical systems. We also show the effectiveness of our method in terms of data-efficiency, generalization to unseen data, and bias-variance trade-off. Finally, we evaluate our NMOE using an MBRL setup, where the model is integrated with a model-based controller and trained online.

下载PDF全文

下载文献需遵守相关版权规定

论文标题