删除多模式分类器中的偏差：通过最大化功能熵的正则化

论文标题

删除多模式分类器中的偏差：通过最大化功能熵的正则化

Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies

论文作者

Gat, Itai, Schwartz, Idan, Schwing, Alexander, Hazan, Tamir

论文摘要

许多最近的数据集都包含各种不同的数据模式，例如，图像，问题和回答视觉问题回答（VQA）中的数据。当在这些多模式数据集上训练深网分类器时，以不同的尺度进行利用的方式，即，某些模式比其他方式更容易为分类结果做出贡献。这是次优的，因为分类器固有地偏向于模态的子集。为了减轻这一缺点，我们提出了一个基于功能熵的新颖正规化项。直观地，该术语鼓励平衡每种方式对分类结果的贡献。但是，功能熵的正则化具有挑战性。为了解决这个问题，我们基于log-sobolev不等式开发了一种方法，该方法将功能熵与功能 - 法案信息界定。直观地，这最大化了模式贡献的信息量。在两个具有挑战性的多模式数据集VQA-CPV2和Socialiq上，我们获得了最先进的结果，同时更均匀地利用了模式。此外，我们证明了我们方法对有色MNIST的功效。

Many recent datasets contain a variety of different data modalities, for instance, image, question, and answer data in visual question answering (VQA). When training deep net classifiers on those multi-modal datasets, the modalities get exploited at different scales, i.e., some modalities can more easily contribute to the classification results than others. This is suboptimal because the classifier is inherently biased towards a subset of the modalities. To alleviate this shortcoming, we propose a novel regularization term based on the functional entropy. Intuitively, this term encourages to balance the contribution of each modality to the classification result. However, regularization with the functional entropy is challenging. To address this, we develop a method based on the log-Sobolev inequality, which bounds the functional entropy with the functional-Fisher-information. Intuitively, this maximizes the amount of information that the modalities contribute. On the two challenging multi-modal datasets VQA-CPv2 and SocialIQ, we obtain state-of-the-art results while more uniformly exploiting the modalities. In addition, we demonstrate the efficacy of our method on Colored MNIST.

下载PDF全文

下载文献需遵守相关版权规定

论文标题