病变收割机：迭代采矿未标记的病变和硬性示例

论文标题

病变收割机：迭代采矿未标记的病变和硬性示例

Lesion Harvester: Iteratively Mining Unlabeled Lesions and Hard-Negative Examples at Scale

论文作者

Cai, Jinzheng, Harrison, Adam P., Zheng, Youjing, Yan, Ke, Huo, Yuankai, Xiao, Jing, Yang, Lin, Lu, Le

论文摘要

由于训练机器学习算法所必需的大规模医学图像数据，由于专家驱动的注释成本过高，因此通常会棘手。从医院档案中提取的最新数据集（例如，Deeplesion）已开始解决此问题。但是，这些通常是不完全或噪声的，例如，深层含量超过50％的病变未标记。因此，收获丢失注释的有效方法对于医学图像分析中的持续进展至关重要。这是我们工作的目标，我们开发了一个强大的系统，以高精度从深层数据集收集丢失的病变。我们接受一定程度的专家劳动以实现高保真度，我们利用了一个少量的全面标记的医疗图像量子集，并利用它从其余的人身上智能地挖掘注释。为此，我们将高度敏感的病变提案发生器和非常有选择性的病变建议分类器链在一起。尽管我们的框架是通用的，但我们通过提出3D上下文病变建议生成器以及使用多视图多尺度病变建议分类器来优化我们的性能。这些产生收获和硬性的提案，然后我们通过使用新型的硬抑制损失来重新利用提案生成器，继续此过程，直到找不到额外的病变为止。广泛的实验分析表明，我们的方法可以收获额外的9,805个病变，同时将精度保持在90％以上。为了证明我们方法的好处，我们表明，在收获病变中训练的病变探测器可以显着优于仅在原始注释上训练的相同变体，而平均精度为7％至10％。我们在https://github.com/jimmycai91/deeplesionannotation上开源注释。

Acquiring large-scale medical image data, necessary for training machine learning algorithms, is frequently intractable, due to prohibitive expert-driven annotation costs. Recent datasets extracted from hospital archives, e.g., DeepLesion, have begun to address this problem. However, these are often incompletely or noisily labeled, e.g., DeepLesion leaves over 50% of its lesions unlabeled. Thus, effective methods to harvest missing annotations are critical for continued progress in medical image analysis. This is the goal of our work, where we develop a powerful system to harvest missing lesions from the DeepLesion dataset at high precision. Accepting the need for some degree of expert labor to achieve high fidelity, we exploit a small fully-labeled subset of medical image volumes and use it to intelligently mine annotations from the remainder. To do this, we chain together a highly sensitive lesion proposal generator and a very selective lesion proposal classifier. While our framework is generic, we optimize our performance by proposing a 3D contextual lesion proposal generator and by using a multi-view multi-scale lesion proposal classifier. These produce harvested and hard-negative proposals, which we then re-use to finetune our proposal generator by using a novel hard negative suppression loss, continuing this process until no extra lesions are found. Extensive experimental analysis demonstrates that our method can harvest an additional 9,805 lesions while keeping precision above 90%. To demonstrate the benefits of our approach, we show that lesion detectors trained on our harvested lesions can significantly outperform the same variants only trained on the original annotations, with boost of average precision of 7% to 10%. We open source our annotations at https://github.com/JimmyCai91/DeepLesionAnnotation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题