论文标题

Vindr-CXR:带有放射科医生注释的胸部X射线射线的开放数据集

VinDr-CXR: An open dataset of chest X-rays with radiologist's annotations

论文作者

Nguyen, Ha Q., Lam, Khanh, Le, Linh T., Pham, Hieu H., Tran, Dat Q., Nguyen, Dung B., Le, Dung D., Pham, Chi M., Tong, Hang T. T., Dinh, Diep H., Do, Cuong D., Doan, Luu T., Nguyen, Cuong N., Nguyen, Binh T., Nguyen, Que V., Hoang, Au D., Phan, Hien N., Nguyen, Anh T., Ho, Phuong H., Ngo, Dat T., Nguyen, Nghia T., Nguyen, Nhan T., Dao, Minh, Vu, Van

论文摘要

现有的大多数胸部X射线数据集都包含发现列表的标签,而无需在X光片上指定其位置。这限制了机器学习算法的开发,以检测和定位胸部异常。在这项工作中,我们描述了一个超过100,000张胸部X射线扫描的数据集,这些数据集是从越南两家主要医院收集的。在这些原始数据中,我们发布了18,000张图像,这些图像由17位经验丰富的放射科医生手动注释,并具有22个局部标签,这些矩形周围有22个矩形和6个全球可疑疾病标签。已发布的数据集分为15,000的培训和3,000个测试组。训练集中的每次扫描均由3位放射科医生独立标记,而测试组中的每次扫描都由5位放射科医生的共识标记。我们为DICOM图像设计并建立了一个标签平台,以促进这些注释程序。所有图像均可公开提供(https://www.physionet.org/content/vindr-cxr/1.0.0/),以DICOM格式以及训练集和测试集的标签。

Most of the existing chest X-ray datasets include labels from a list of findings without specifying their locations on the radiographs. This limits the development of machine learning algorithms for the detection and localization of chest abnormalities. In this work, we describe a dataset of more than 100,000 chest X-ray scans that were retrospectively collected from two major hospitals in Vietnam. Out of this raw data, we release 18,000 images that were manually annotated by a total of 17 experienced radiologists with 22 local labels of rectangles surrounding abnormalities and 6 global labels of suspected diseases. The released dataset is divided into a training set of 15,000 and a test set of 3,000. Each scan in the training set was independently labeled by 3 radiologists, while each scan in the test set was labeled by the consensus of 5 radiologists. We designed and built a labeling platform for DICOM images to facilitate these annotation procedures. All images are made publicly available (https://www.physionet.org/content/vindr-cxr/1.0.0/) in DICOM format along with the labels of both the training set and the test set.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源