通过研究模型，分析从医学图像中分析数据不平衡对学到的功能的影响

论文标题

通过研究模型，分析从医学图像中分析数据不平衡对学到的功能的影响

Analyzing the Effects of Handling Data Imbalance on Learned Features from Medical Images by Looking Into the Models

论文作者

Khakzar, Ashkan, Li, Yawei, Zhang, Yang, Sanisoglu, Mirac, Kim, Seong Tae, Rezaei, Mina, Bischl, Bernd, Navab, Nassir

论文摘要

在医疗数据集中潜伏的一个具有挑战性的属性是数据分布不平衡，其中不同类之间的样品的频率不平衡。在不平衡数据集上训练模型可以为模型偏向高度频繁的类别的学习问题引入独特的挑战。提出了许多方法来解决分布差异和不平衡问题。但是，这些方法对学习特征的影响尚未得到很好的研究。在本文中，我们更深入地研究神经网络的内部单元，以观察处理数据不平衡如何影响学习的功能。我们从多个角度研究了几种流行的成本敏感方法来处理数据不平衡，并分析了卷积神经网络的特征图：分析显着特征与病理学的一致性并分析由网络编码的病理相关概念。我们的研究揭示了关于训练的模型的差异和见解，这些模型不受定量指标（例如AUROC和AP）反映的差异和见解，并仅通过镜头观察模型来显示。

One challenging property lurking in medical datasets is the imbalanced data distribution, where the frequency of the samples between the different classes is not balanced. Training a model on an imbalanced dataset can introduce unique challenges to the learning problem where a model is biased towards the highly frequent class. Many methods are proposed to tackle the distributional differences and the imbalanced problem. However, the impact of these approaches on the learned features is not well studied. In this paper, we look deeper into the internal units of neural networks to observe how handling data imbalance affects the learned features. We study several popular cost-sensitive approaches for handling data imbalance and analyze the feature maps of the convolutional neural networks from multiple perspectives: analyzing the alignment of salient features with pathologies and analyzing the pathology-related concepts encoded by the networks. Our study reveals differences and insights regarding the trained models that are not reflected by quantitative metrics such as AUROC and AP and show up only by looking at the models through a lens.

下载PDF全文

下载文献需遵守相关版权规定

论文标题