论文标题
自然与平衡分布在整个幻灯片图像上进行癌症检测
Natural vs Balanced Distribution in Deep Learning on Whole Slide Images for Cancer Detection
论文作者
论文摘要
数据分布是调节机器学习模型性能的因素之一。但是,对文献中可用的不同分布的影响的调查很少,有时不用于特定领域的任务。在本文中,我们分析了应用于组织学图像(也称为全幻灯片图像(WSIS))的深度学习模型(DL)模型中训练集(DL)模型的自然和平衡分布的影响。 WSI被认为是癌症诊断的黄金标准。近年来,研究人员将注意力转向DL模型以自动化和加速诊断过程。在训练此类DL模型中,从WSIS中滤除非区域的非区域并采用人工分布(通常是平衡的分布)是一个普遍的趋势。在我们的分析中,我们表明,将WSIS数据保留在其通常的分布中(我们称为自然分布),用于DL训练的误报(FPS)较少,而误报(FPS)少于人工触发的平衡分布。我们对每个分布进行了10个随机折叠的经验比较研究,以五个不同的评估指标进行比较的平均绩效水平。实验结果表明,自然分布在所有评估指标中平衡的分布的有效性。
The class distribution of data is one of the factors that regulates the performance of machine learning models. However, investigations on the impact of different distributions available in the literature are very few, sometimes absent for domain-specific tasks. In this paper, we analyze the impact of natural and balanced distributions of the training set in deep learning (DL) models applied on histological images, also known as whole slide images (WSIs). WSIs are considered as the gold standard for cancer diagnosis. In recent years, researchers have turned their attention to DL models to automate and accelerate the diagnosis process. In the training of such DL models, filtering out the non-regions-of-interest from the WSIs and adopting an artificial distribution (usually, a balanced distribution) is a common trend. In our analysis, we show that keeping the WSIs data in their usual distribution (which we call natural distribution) for DL training produces fewer false positives (FPs) with comparable false negatives (FNs) than the artificially-obtained balanced distribution. We conduct an empirical comparative study with 10 random folds for each distribution, comparing the resulting average performance levels in terms of five different evaluation metrics. Experimental results show the effectiveness of the natural distribution over the balanced one across all the evaluation metrics.