论文标题

模式竞争:是什么使多模式网络的联合培训失败了? (证明)

Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably)

论文作者

Huang, Yu, Lin, Junyang, Zhou, Chang, Yang, Hongxia, Huang, Longbo

论文摘要

尽管在实践中深度多模式学习取得了显着的成功,但理论上并没有得到很好的解释。最近,已经观察到,最佳的单模式网络的表现优于训练有素的多模式网络,这是反直觉的,因为多个信号通常会带来更多信息。这项工作为神经网络中这种性能差距的出现提供了理论上的解释,以实现流行的联合培训框架。基于捕获多模式数据现实属性的简化数据分布,我们证明,对于通过梯度下降共同训练的(平滑)relu激活的多模式后期融合网络,不同的模式将相互竞争。编码器网络将仅学习一部分方式。我们将这种现象称为模态竞争。未能发现的失败方式是联合培训的亚次优。在实验上,我们说明了模态竞争与后融合联合训练的内在行为相匹配。

Despite the remarkable success of deep multi-modal learning in practice, it has not been well-explained in theory. Recently, it has been observed that the best uni-modal network outperforms the jointly trained multi-modal network, which is counter-intuitive since multiple signals generally bring more information. This work provides a theoretical explanation for the emergence of such performance gap in neural networks for the prevalent joint training framework. Based on a simplified data distribution that captures the realistic property of multi-modal data, we prove that for the multi-modal late-fusion network with (smoothed) ReLU activation trained jointly by gradient descent, different modalities will compete with each other. The encoder networks will learn only a subset of modalities. We refer to this phenomenon as modality competition. The losing modalities, which fail to be discovered, are the origins where the sub-optimality of joint training comes from. Experimentally, we illustrate that modality competition matches the intrinsic behavior of late-fusion joint training.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源