论文标题

表征和克服多模式深神经网络中学习的贪婪本质

Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks

论文作者

Wu, Nan, Jastrzębski, Stanisław, Cho, Kyunghyun, Geras, Krzysztof J.

论文摘要

我们假设,由于多模式深神经网络中学习的贪婪本质,这些模型倾向于仅依靠一种方式,同时又构成了其他模式。这种行为是违反直觉的,并损害了模型的概括,正如我们从经验上观察到的。为了估计模型对每种模式的依赖性,我们还计算模型还可以访问其他模式时的准确性增益。我们将此增益称为条件利用率。在实验中,我们始终观察到模式之间的条件利用率,多个任务和体系结构之间的条件利用率不平衡。由于在训练期间无法有效地计算条件利用率,因此我们根据模型从每种模式中学习的速度引入代理,我们将其称为条件学习速度。我们提出了一种算法,以平衡培训过程中模式之间的有条件学习速度,并证明它确实解决了贪婪学习的问题。提出的算法改善了模型在三个数据集上的概括:彩色MNIST,ModelNet40和Nvidia Dynamic Hand手势。

We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, ModelNet40, and NVIDIA Dynamic Hand Gesture.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源