模式竞争：是什么使多模式网络的联合培训失败了？（证明）

论文标题

模式竞争：是什么使多模式网络的联合培训失败了？（证明）

Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably)

论文作者

Huang, Yu, Lin, Junyang, Zhou, Chang, Yang, Hongxia, Huang, Longbo

论文摘要

尽管在实践中深度多模式学习取得了显着的成功，但理论上并没有得到很好的解释。最近，已经观察到，最佳的单模式网络的表现优于训练有素的多模式网络，这是反直觉的，因为多个信号通常会带来更多信息。这项工作为神经网络中这种性能差距的出现提供了理论上的解释，以实现流行的联合培训框架。基于捕获多模式数据现实属性的简化数据分布，我们证明，对于通过梯度下降共同训练的（平滑）relu激活的多模式后期融合网络，不同的模式将相互竞争。编码器网络将仅学习一部分方式。我们将这种现象称为模态竞争。未能发现的失败方式是联合培训的亚次优。在实验上，我们说明了模态竞争与后融合联合训练的内在行为相匹配。

Despite the remarkable success of deep multi-modal learning in practice, it has not been well-explained in theory. Recently, it has been observed that the best uni-modal network outperforms the jointly trained multi-modal network, which is counter-intuitive since multiple signals generally bring more information. This work provides a theoretical explanation for the emergence of such performance gap in neural networks for the prevalent joint training framework. Based on a simplified data distribution that captures the realistic property of multi-modal data, we prove that for the multi-modal late-fusion network with (smoothed) ReLU activation trained jointly by gradient descent, different modalities will compete with each other. The encoder networks will learn only a subset of modalities. We refer to this phenomenon as modality competition. The losing modalities, which fail to be discovered, are the origins where the sub-optimality of joint training comes from. Experimentally, we illustrate that modality competition matches the intrinsic behavior of late-fusion joint training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题