关于自动化胸部X射线报告的图像编码的重要性

论文标题

关于自动化胸部X射线报告的图像编码的重要性

On the Importance of Image Encoding in Automated Chest X-Ray Report Generation

论文作者

Nazarov, Otabek, Yaqub, Mohammad, Nandakumar, Karthik

论文摘要

胸部X射线是最受欢迎的医学成像方式之一，由于其可及性和有效性。但是，训练有素的放射科医生长期短缺可以解释这些图像并诊断患者的病情。因此，自动放射学报告的生成可能是临床实践中非常有用的工具。典型的报告生成工作流程包括两个主要步骤：（i）将图像编码为潜在空间，（ii）基于潜在图像嵌入生成报告的文本。许多现有的报告生成技术都使用标准的卷积神经网络（CNN）体系结构进行图像编码，然后是基于变压器的医学文本生成的解码器。在大多数情况下，CNN和解码器以端到端的方式共同培训。在这项工作中，我们主要专注于理解编码器和解码器组件的相对重要性。为此，我们分析了四种不同的图像编码方法：直接，细粒度，基于夹子和群集clip的编码与大规模模拟器CXR数据集中的三个不同解码器结合使用。在这些编码器中，群集夹视觉编码器是一种新颖的方法，旨在生成更具歧视性和可解释的表示形式。基于夹子的编码器在NLP指标方面与传统的基于CNN的编码器产生可比的结果，而细粒度编码在NLP和临床准确性指标方面都超过所有其他编码器，从而验证了图像编码器有效提取语义信息的重要性。 github存储库：https：//github.com/mudabek/encoding-cxr-report-gen

Chest X-ray is one of the most popular medical imaging modalities due to its accessibility and effectiveness. However, there is a chronic shortage of well-trained radiologists who can interpret these images and diagnose the patient's condition. Therefore, automated radiology report generation can be a very helpful tool in clinical practice. A typical report generation workflow consists of two main steps: (i) encoding the image into a latent space and (ii) generating the text of the report based on the latent image embedding. Many existing report generation techniques use a standard convolutional neural network (CNN) architecture for image encoding followed by a Transformer-based decoder for medical text generation. In most cases, CNN and the decoder are trained jointly in an end-to-end fashion. In this work, we primarily focus on understanding the relative importance of encoder and decoder components. Towards this end, we analyze four different image encoding approaches: direct, fine-grained, CLIP-based, and Cluster-CLIP-based encodings in conjunction with three different decoders on the large-scale MIMIC-CXR dataset. Among these encoders, the cluster CLIP visual encoder is a novel approach that aims to generate more discriminative and explainable representations. CLIP-based encoders produce comparable results to traditional CNN-based encoders in terms of NLP metrics, while fine-grained encoding outperforms all other encoders both in terms of NLP and clinical accuracy metrics, thereby validating the importance of image encoder to effectively extract semantic information. GitHub repository: https://github.com/mudabek/encoding-cxr-report-gen

下载PDF全文

下载文献需遵守相关版权规定

论文标题