基于高斯拉普拉斯金字塔混合的组织病理学图像的数据增强

论文标题

基于高斯拉普拉斯金字塔混合的组织病理学图像的数据增强

Data Augmentation for Histopathological Images Based on Gaussian-Laplacian Pyramid Blending

论文作者

Ataky, Steve Tsham Mpinda, de Matos, Jonathan, Britto Jr., Alceu de S., Oliveira, Luiz E. S., Koerich, Alessandro L.

论文摘要

数据不平衡是影响多种机器学习（ML）算法的主要问题。这样的问题很麻烦，因为大多数ML算法都试图优化不考虑数据不平衡的损耗函数。因此，ML算法简单地生成了一个微不足道的模型，该模型偏向预测训练数据中最常见的类别。在组织病理学图像（HIS）的情况下，在存在患者间可变性的情况下应用时，低级和高级数据增强（DA）技术仍会呈现性能问题；该模型倾向于学习与染色过程有关的颜色表示。在本文中，我们提出了一种新型方法，不仅能够增强HI数据集，而且还可以通过使用高斯拉普拉斯金字塔的图像混合来分发患者间的变异性。所提出的方法包括找到两种不同患者图像的高斯金字塔，并找到其laplacian金字塔。之后，在拉普拉斯金字塔的每个级别上，左半侧和右侧的左侧与他的右侧相连，从关节金字塔中，原始图像被重建。该组成结合了两名患者的污渍变化，避免了颜色差异误导学习过程。关于Breakhis数据集的实验结果表明，与文献中介绍的大多数DA技术相对于大多数DA技术。

Data imbalance is a major problem that affects several machine learning (ML) algorithms. Such a problem is troublesome because most of the ML algorithms attempt to optimize a loss function that does not take into account the data imbalance. Accordingly, the ML algorithm simply generates a trivial model that is biased toward predicting the most frequent class in the training data. In the case of histopathologic images (HIs), both low-level and high-level data augmentation (DA) techniques still present performance issues when applied in the presence of inter-patient variability; whence the model tends to learn color representations, which is related to the staining process. In this paper, we propose a novel approach capable of not only augmenting HI dataset but also distributing the inter-patient variability by means of image blending using the Gaussian-Laplacian pyramid. The proposed approach consists of finding the Gaussian pyramids of two images of different patients and finding the Laplacian pyramids thereof. Afterwards, the left-half side and the right-half side of different HIs are joined in each level of the Laplacian pyramid, and from the joint pyramids, the original image is reconstructed. This composition combines the stain variation of two patients, avoiding that color differences mislead the learning process. Experimental results on the BreakHis dataset have shown promising gains vis-a-vis the majority of DA techniques presented in the literature.

下载PDF全文

下载文献需遵守相关版权规定

论文标题