论文标题

克服开放视觉计数的统计快捷方式

Overcoming Statistical Shortcuts for Open-ended Visual Counting

论文作者

Dancette, Corentin, Cadene, Remi, Chen, Xinlei, Cord, Matthieu

论文摘要

机器学习模型倾向于过度沿着统计快捷方式上。在现实世界中,输入部分和输出标签之间的这些虚假相关性不存在。我们将这个问题定为最近的开放视觉计数任务,该任务非常适合研究统计快捷方式。我们旨在开发模型,以学习计算正确机制,而不管输出标签如何。首先,我们提出了修改计数分布(MCD)协议,该协议对统计快捷方式过度融合的模型进行了惩罚。它基于不遵循与奇数组相同的计数标签分布的训练和测试集。在直觉上,已经学会了正确计算奇数机制的模型在偶数上应该表现良好。其次,我们介绍了空间计数网络(SCN),该网络致力于基于自然语言问题的视觉分析和计数。我们的模型选择相关的图像区域,通过融合和自我发挥机制进行评分,并提供最终的计数得分。我们将我们的协议应用于最近的数据集,TallyQA,并显示出与最先进模型相比的表现出色的性能。我们还展示了我们的模型选择正确实例以在图像中进行计数的能力。可用代码和数据集:https://github.com/cdancette/spatial-counting-network

Machine learning models tend to over-rely on statistical shortcuts. These spurious correlations between parts of the input and the output labels does not hold in real-world settings. We target this issue on the recent open-ended visual counting task which is well suited to study statistical shortcuts. We aim to develop models that learn a proper mechanism of counting regardless of the output label. First, we propose the Modifying Count Distribution (MCD) protocol, which penalizes models that over-rely on statistical shortcuts. It is based on pairs of training and testing sets that do not follow the same count label distribution such as the odd-even sets. Intuitively, models that have learned a proper mechanism of counting on odd numbers should perform well on even numbers. Secondly, we introduce the Spatial Counting Network (SCN), which is dedicated to visual analysis and counting based on natural language questions. Our model selects relevant image regions, scores them with fusion and self-attention mechanisms, and provides a final counting score. We apply our protocol on the recent dataset, TallyQA, and show superior performances compared to state-of-the-art models. We also demonstrate the ability of our model to select the correct instances to count in the image. Code and datasets are available: https://github.com/cdancette/spatial-counting-network

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源