论文标题
进行多模式集成的渐进式融合
Progressive Fusion for Multimodal Integration
论文作者
论文摘要
已显示来自各种来源的多模式信息的整合可提高机器学习模型的性能,因此近年来受到了越来越多的关注。通常,这样的模型使用深层模式特异性网络来获得单峰特征,这些特征合并以获得“晚融合”表示。但是,这些设计在各自的单峰管道中冒着信息损失的风险。另一方面,早期特征结合特征的“早期融合”方法与特征异质性和高样本复杂性相关的问题。在这项工作中,我们提出了一种迭代表示改进的方法,称为渐进式融合,该方法减轻了晚期融合表示的问题。我们的模型不足技术引入了向后连接,使后期融合的表示形式可用于早期层,从而提高了在这些阶段的表现力,同时保留了晚期融合设计的优势。我们测试了包括情感检测,多媒体分析以及与不同模型的时间序列融合在内的任务进行的渐进式融合,以证明其多功能性。我们表明,我们的方法始终提高性能,例如,在多模式时间序列预测中,MSE降低了5%,鲁棒性提高了40%。
Integration of multimodal information from various sources has been shown to boost the performance of machine learning models and thus has received increased attention in recent years. Often such models use deep modality-specific networks to obtain unimodal features which are combined to obtain "late-fusion" representations. However, these designs run the risk of information loss in the respective unimodal pipelines. On the other hand, "early-fusion" methodologies, which combine features early, suffer from the problems associated with feature heterogeneity and high sample complexity. In this work, we present an iterative representation refinement approach, called Progressive Fusion, which mitigates the issues with late fusion representations. Our model-agnostic technique introduces backward connections that make late stage fused representations available to early layers, improving the expressiveness of the representations at those stages, while retaining the advantages of late fusion designs. We test Progressive Fusion on tasks including affective sentiment detection, multimedia analysis, and time series fusion with different models, demonstrating its versatility. We show that our approach consistently improves performance, for instance attaining a 5% reduction in MSE and 40% improvement in robustness on multimodal time series prediction.