论文标题
汽车不能在天空中飞来飞来:通过高度驱动的注意网络改善城市场景细分
Cars Can't Fly up in the Sky: Improving Urban-Scene Segmentation via Height-driven Attention Networks
论文作者
论文摘要
本文利用了城市场景图像的内在特征,并提出了一个通用的附加模块,称为高度驱动的注意网络(HANET),以改善城市场景图像的语义细分。它根据像素的垂直位置有选择地强调信息性特征或类别。在Urban-Scene图像中,在水平分段的部分中,像素类别的类别分布相互差异。同样,Urban-Scene图像也具有自己独特的特征,但是大多数语义分割网络并不能反映架构中的独特属性。提出的网络体系结构结合了利用属性以有效处理城市场景数据集的功能。当采用Hanet时,我们验证了两个数据集上各种语义分割模型的一致性(MIOU)。这种广泛的定量分析表明,将我们的模块添加到现有模型非常容易且具有成本效益。我们的方法在CityScapes基准测试中实现了新的最新性能,基于RESNET-101的细分模型之间的利润很大。另外,我们表明,提出的模型通过可视化和解释注意力图来与城市场景中观察到的事实相一致。我们的代码和训练有素的模型可在https://github.com/shachoi/hanet上公开获取
This paper exploits the intrinsic features of urban-scene images and proposes a general add-on module, called height-driven attention networks (HANet), for improving semantic segmentation for urban-scene images. It emphasizes informative features or classes selectively according to the vertical position of a pixel. The pixel-wise class distributions are significantly different from each other among horizontally segmented sections in the urban-scene images. Likewise, urban-scene images have their own distinct characteristics, but most semantic segmentation networks do not reflect such unique attributes in the architecture. The proposed network architecture incorporates the capability exploiting the attributes to handle the urban scene dataset effectively. We validate the consistent performance (mIoU) increase of various semantic segmentation models on two datasets when HANet is adopted. This extensive quantitative analysis demonstrates that adding our module to existing models is easy and cost-effective. Our method achieves a new state-of-the-art performance on the Cityscapes benchmark with a large margin among ResNet-101 based segmentation models. Also, we show that the proposed model is coherent with the facts observed in the urban scene by visualizing and interpreting the attention map. Our code and trained models are publicly available at https://github.com/shachoi/HANet