论文标题

犬乳腺癌的完全注释的全幻灯片图像数据集,以帮助人类乳腺癌研究

A completely annotated whole slide image dataset of canine breast cancer to aid human breast cancer research

论文作者

Aubreville, Marc, Bertram, Christof A., Donovan, Taryn A., Marzahl, Christian, Maier, Andreas, Klopfleisch, Robert

论文摘要

犬乳腺癌(CMC)已被用作研究人类乳腺癌发病机理的模型,并且通常使用相同的分级方案来评估两者的肿瘤恶性肿瘤。该分级方案的一个关键组成部分是有丝分裂图(MF)的密度。当前有关人类乳腺癌的公开数据集仅为整个幻灯片图像(WSIS)的小子集提供注释。我们提出了一个针对MF完全注释的21个WSI的新型数据集。为此,病理学家筛选了所有WSI的潜在MF和具有相似外观的结构。第二位专家分配的标签,对于非匹配标签,第三位专家分配了最终标签。此外,我们使用机器学习来识别先前未发现的MF。最后,我们进行了表示学习和二维投影,以进一步提高注释的一致性。我们的数据集由13,907 MF和36,379个硬否负面负数组成。在测试集中,我们的平均F1得分为0.791,在人类乳腺癌数据集上达到了高达0.696。

Canine mammary carcinoma (CMC) has been used as a model to investigate the pathogenesis of human breast cancer and the same grading scheme is commonly used to assess tumor malignancy in both. One key component of this grading scheme is the density of mitotic figures (MF). Current publicly available datasets on human breast cancer only provide annotations for small subsets of whole slide images (WSIs). We present a novel dataset of 21 WSIs of CMC completely annotated for MF. For this, a pathologist screened all WSIs for potential MF and structures with a similar appearance. A second expert blindly assigned labels, and for non-matching labels, a third expert assigned the final labels. Additionally, we used machine learning to identify previously undetected MF. Finally, we performed representation learning and two-dimensional projection to further increase the consistency of the annotations. Our dataset consists of 13,907 MF and 36,379 hard negatives. We achieved a mean F1-score of 0.791 on the test set and of up to 0.696 on a human breast cancer dataset.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源