LVIT：语言符合视觉变压器的医学图像细分

论文标题

LVIT：语言符合视觉变压器的医学图像细分

LViT: Language meets Vision Transformer in Medical Image Segmentation

论文作者

Li, Zihan, Li, Yunxiang, Li, Qingde, Wang, Puyang, Guo, Dazhou, Lu, Le, Jin, Dakai, Zhang, You, Hong, Qingqi

论文摘要

深度学习已被广泛用于医学图像细分和其他方面。但是，由于数据注释成本过高，现有医疗图像分割模型的性能受到获得足够高质量标记的数据的挑战的限制。为了减轻这一限制，我们提出了一种新的文本启动医学图像分割模型LVIT（语言符合Vision Transformer）。在我们的LVIT模型中，纳入了医学文本注释，以补偿图像数据中的质量缺陷。此外，文本信息可以指导在半监督学习中生成提高质量的伪标签。我们还提出了指数伪标签迭代机制（EPI），以帮助像素级注意模块（PLAM）在半监视的LVIT设置中保留本地图像特征。在我们的模型中，LV（语言视觉）损失旨在直接使用文本信息监督未标记图像的培训。为了进行评估，我们构建了包含X射线和CT图像的三个多模式医学分割数据集（图像 +文本）。实验结果表明，我们提出的LVIT在完全监督和半监督的设置中具有卓越的分割性能。代码和数据集可在https://github.com/huanglizi/lvit上找到。

Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient high-quality labeled data due to the prohibitive data annotation cost. To alleviate this limitation, we propose a new text-augmented medical image segmentation model LViT (Language meets Vision Transformer). In our LViT model, medical text annotation is incorporated to compensate for the quality deficiency in image data. In addition, the text information can guide to generate pseudo labels of improved quality in the semi-supervised learning. We also propose an Exponential Pseudo label Iteration mechanism (EPI) to help the Pixel-Level Attention Module (PLAM) preserve local image features in semi-supervised LViT setting. In our model, LV (Language-Vision) loss is designed to supervise the training of unlabeled images using text information directly. For evaluation, we construct three multimodal medical segmentation datasets (image + text) containing X-rays and CT images. Experimental results show that our proposed LViT has superior segmentation performance in both fully-supervised and semi-supervised setting. The code and datasets are available at https://github.com/HUANGLIZI/LViT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题