在Gigapixel上的计算病理学的联合学习全幻灯片图像

论文标题

在Gigapixel上的计算病理学的联合学习全幻灯片图像

Federated Learning for Computational Pathology on Gigapixel Whole Slide Images

论文作者

Lu, Ming Y., Kong, Dehan, Lipkova, Jana, Chen, Richard J., Singh, Rajendra, Williamson, Drew F. K., Chen, Tiffany Y., Mahmood, Faisal

论文摘要

基于深度学习的计算病理学算法已经表现出了在各种任务中表现出色的能力，这些任务包括对众所周知的形态表型的表征到预测来自组织学的非人类可识别特征，例如分子改变。但是，稳健，适应性和准确的深度学习模型的开发通常依赖于收集和时间成本的大型高质量注释数据，理想情况下应该来自不同的来源和患者人群来满足此类数据集中存在的异质性。多个机构跨多个机构的医学数据的多中心和协作集成自然可以帮助克服这一挑战并提高模型性能，但受到隐私问题的限制，这些困难在复杂的数据共享过程中可能会出现，因为模型扩展了数十万GigaiGapixel的整个幻灯片图像。在本文中，我们使用弱监督的注意力多重实例学习和差异隐私引入了计算病理学中的吉普像素整体幻灯片图像的联合学习。我们使用数千种组织学的整个幻灯片图像仅具有幻灯片级标签，对两个不同的诊断问题进行了评估。此外，我们提出了一个弱监督的学习框架，用于从整个幻灯片图像中进行生存预测和患者分层，并在联合环境中证明其有效性。我们的结果表明，使用联合学习，我们可以从分布式数据筒仓中有效地开发出准确的弱监督深度学习模型，而无需直接数据共享及其相关的复杂性，同时还可以使用随机噪声生成随机噪声。

Deep Learning-based computational pathology algorithms have demonstrated profound ability to excel in a wide array of tasks that range from characterization of well known morphological phenotypes to predicting non-human-identifiable features from histology such as molecular alterations. However, the development of robust, adaptable, and accurate deep learning-based models often rely on the collection and time-costly curation large high-quality annotated training data that should ideally come from diverse sources and patient populations to cater for the heterogeneity that exists in such datasets. Multi-centric and collaborative integration of medical data across multiple institutions can naturally help overcome this challenge and boost the model performance but is limited by privacy concerns amongst other difficulties that may arise in the complex data sharing process as models scale towards using hundreds of thousands of gigapixel whole slide images. In this paper, we introduce privacy-preserving federated learning for gigapixel whole slide images in computational pathology using weakly-supervised attention multiple instance learning and differential privacy. We evaluated our approach on two different diagnostic problems using thousands of histology whole slide images with only slide-level labels. Additionally, we present a weakly-supervised learning framework for survival prediction and patient stratification from whole slide images and demonstrate its effectiveness in a federated setting. Our results show that using federated learning, we can effectively develop accurate weakly supervised deep learning models from distributed data silos without direct data sharing and its associated complexities, while also preserving differential privacy using randomized noise generation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题