基于图像语义分段

论文标题

基于图像语义分段

A New Multiple Max-pooling Integration Module and Cross Multiscale Deconvolution Network Based on Image Semantic Segmentation

论文作者

You, Hongfeng, Tian, Shengwei, Yu, Long, Ma, Xiang, Xing, Yan, Xin, Ning

论文摘要

为了更好地保留图像的深度特征并解决了端到端分割模型的稀疏问题，我们为医疗图像像素分割（称为MC-NET）提出了一种新的深卷积网络模型。该网络模型的核心由四个部分组成，即编码器网络，一个多最大通用集成模块，一个跨多尺度解卷积解码器网络和像素级分类层。在编码器的网络结构中，我们使用多尺度卷积而不是传统的单渠道卷积。多个最大通用集成模块首先集成编码器网络每个suppo的输出功能，并使用内核大小通过卷积来减少参数的数量。与此同时，同一时间，每个卷积后每个最大池大小（每个层的池大小都不同）是在每个卷积之后旋转的，以实现每个subsodule subsodule映射的翻译不变性。我们将来自多个最大通用集成模块的输出特征图用作解码器网络的输入。解码器网络中每个子模块的多尺度卷积与编码器网络中相应的多尺度卷积生成的特征图交叉融合。使用上面的特征映射处理方法在最大化层生成矩阵之后解决了稀疏问题，并增强了分类的鲁棒性。我们将提出的模型与众所周知的完全卷积网络进行了比较，用于语义分割（FCN），DecoVnet，PSPNet，U-Net，Sgenet，Sgenet，Sgenet和其他最先进的分段网络，例如HyperDenseNet，MS-Dual，MS-Dual，ESPNETV2，Denseaspp，Denseaspp，Denseaspp，使用一个Binary Kaggle 2018 Data Cocience Cociece Bown Dataine and Multiors usefers and Data DataSefers和两个多级数据和两个综合效果。

To better retain the deep features of an image and solve the sparsity problem of the end-to-end segmentation model, we propose a new deep convolutional network model for medical image pixel segmentation, called MC-Net. The core of this network model consists of four parts, namely, an encoder network, a multiple max-pooling integration module, a cross multiscale deconvolution decoder network and a pixel-level classification layer. In the network structure of the encoder, we use multiscale convolution instead of the traditional single-channel convolution. The multiple max-pooling integration module first integrates the output features of each submodule of the encoder network and reduces the number of parameters by convolution using a kernel size of 1. At the same time, each max-pooling layer (the pooling size of each layer is different) is spliced after each convolution to achieve the translation invariance of the feature maps of each submodule. We use the output feature maps from the multiple max-pooling integration module as the input of the decoder network; the multiscale convolution of each submodule in the decoder network is cross-fused with the feature maps generated by the corresponding multiscale convolution in the encoder network. Using the above feature map processing methods solves the sparsity problem after the max-pooling layer-generating matrix and enhances the robustness of the classification. We compare our proposed model with the well-known Fully Convolutional Networks for Semantic Segmentation (FCNs), DecovNet, PSPNet, U-net, SgeNet and other state-of-the-art segmentation networks such as HyperDenseNet, MS-Dual, Espnetv2, Denseaspp using one binary Kaggle 2018 data science bowl dataset and two multiclass dataset and obtain encouraging experimental results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题