论文标题
指南针:自主系统的对比度多模式预处理
COMPASS: Contrastive Multimodal Pretraining for Autonomous Systems
论文作者
论文摘要
跨任务和领域概括的学习表征对于自主系统来说是必不可少的。尽管任务驱动的方法很吸引人,但面对有限的数据,针对每个应用程序的设计模型可能很困难,尤其是在处理高度可变的多模式输入空间时,是由不同环境中不同任务引起的。我们引入了第一个通用预处理管道,相反的多型训练范围,用于自动化系统(COMPASS)的多型训练(Compass)的方法,以克服限制模型和任务的限制。指南针通过考虑自主系统的基本信息和不同模态的属性来构建多模式图。通过此图,多模式信号连接并映射到两个分解的时空潜在空间:“运动模式空间”和“当前状态空间”。通过从每个潜在空间中的多模式对应关系中学习,指南针创建了状态表示,以建模必要的信息,例如时间动力学,几何和语义。我们在大规模多模拟数据集塔塔尔\ cite {tartanair202020iros}上给指南针预算,并在无人机导航,车辆赛车和视觉探针测试上进行评估。实验表明,指南针可以解决所有三种情况,还可以推广到看不见的环境和现实世界数据。
Learning representations that generalize across tasks and domains is challenging yet necessary for autonomous systems. Although task-driven approaches are appealing, designing models specific to each application can be difficult in the face of limited data, especially when dealing with highly variable multimodal input spaces arising from different tasks in different environments.We introduce the first general-purpose pretraining pipeline, COntrastive Multimodal Pretraining for AutonomouS Systems (COMPASS), to overcome the limitations of task-specific models and existing pretraining approaches. COMPASS constructs a multimodal graph by considering the essential information for autonomous systems and the properties of different modalities. Through this graph, multimodal signals are connected and mapped into two factorized spatio-temporal latent spaces: a "motion pattern space" and a "current state space." By learning from multimodal correspondences in each latent space, COMPASS creates state representations that models necessary information such as temporal dynamics, geometry, and semantics. We pretrain COMPASS on a large-scale multimodal simulation dataset TartanAir \cite{tartanair2020iros} and evaluate it on drone navigation, vehicle racing, and visual odometry tasks. The experiments indicate that COMPASS can tackle all three scenarios and can also generalize to unseen environments and real-world data.