论文标题
MHITNET:用层次上下文 - 注意过滤器进行分割医疗CT图像的网络最小化网络
MHITNet: a minimize network with a hierarchical context-attentional filter for segmenting medical ct images
论文作者
论文摘要
在医疗CT图像处理领域中,卷积神经网络(CNN)已成为主要技术。编码器编码器CNNS利用局部性来提高效率,但它们无法正确模拟遥远的像素相互作用。进行远距离的研究表明,自我关注或变换层可以堆叠以有效地构建依据。应用于计算机视觉应用。但是,基于变压器的体系结构缺乏全球语义信息互动,需要大规模的培训数据集,这使得使用小型数据样本进行培训具有挑战性。为了解决这些挑战,我们提出了一个层次的上下文变压器网络(MHITNET),该网络(MHITNET)结合了跳过连接中的多尺度,变压器和分层上下文提取模块。多尺度模块捕获了更深层次的CT语义信息,使变压器能够从各个CNN阶段编码令牌化图片贴片的特征图,以更有效地将其作为输入注意序列。分层上下文注意模块增加了全球数据并重新重量以捕获语义上下文。三个数据集的扩展试验表明,提出的MHITNET击败了当前的最佳实践
In the field of medical CT image processing, convolutional neural networks (CNNs) have been the dominant technique.Encoder-decoder CNNs utilise locality for efficiency, but they cannot simulate distant pixel interactions properly.Recent research indicates that self-attention or transformer layers can be stacked to efficiently learn long-range dependencies.By constructing and processing picture patches as embeddings, transformers have been applied to computer vision applications. However, transformer-based architectures lack global semantic information interaction and require a large-scale training dataset, making it challenging to train with small data samples. In order to solve these challenges, we present a hierarchical contextattention transformer network (MHITNet) that combines the multi-scale, transformer, and hierarchical context extraction modules in skip-connections. The multi-scale module captures deeper CT semantic information, enabling transformers to encode feature maps of tokenized picture patches from various CNN stages as input attention sequences more effectively. The hierarchical context attention module augments global data and reweights pixels to capture semantic context.Extensive trials on three datasets show that the proposed MHITNet beats current best practises