学习丰富的功能，用于快速图像恢复和增强

论文标题

学习丰富的功能，用于快速图像恢复和增强

Learning Enriched Features for Fast Image Restoration and Enhancement

论文作者

Zamir, Syed Waqas, Arora, Aditya, Khan, Salman, Hayat, Munawar, Khan, Fahad Shahbaz, Yang, Ming-Hsuan, Shao, Ling

论文摘要

鉴于输入图像降级，图像恢复旨在恢复缺失的高质量图像内容。许多应用需要有效的图像恢复，例如计算摄影，监视，自动驾驶汽车和遥感。近年来，图像恢复的重大进展是由卷积神经网络（CNN）主导的。广泛使用的基于CNN的方法通常在完整分辨率或逐步的低分辨率表示上运行。在前一种情况下，保留了空间细节，但上下文信息不能精确编码。在后一种情况下，生成的输出在语义上是可靠的，但在空间上的准确性较低。本文提出了一个新的体系结构，其整体目标是通过整个网络保持空间高分辨率的高分辨率表示，并从低分辨率表示中接收互补的上下文信息。我们方法的核心是包含以下关键要素的多尺度残差块：（a）平行的多分辨率卷积流，用于提取多尺度特征，（b）跨多分辨率流中的信息交换，（c）用于捕获上下文信息的非局部注意力，以及（d）基于多数尺度的范围。我们的方法学习了一组丰富的功能，这些功能结合了多个量表的上下文信息，同时保留了高分辨率的空间细节。在六个真实图像基准数据集上进行的广泛实验表明，我们的方法（称为mirnet-v2）为各种图像处理任务（包括Defocus DeBlurring，Image DeNoising，super-solutions和Image Enhancement）取得了最新的结果。源代码和预培训模型可在https://github.com/swz30/mirnetv2上找到

Given a degraded input image, image restoration aims to recover the missing high-quality image content. Numerous applications demand effective image restoration, e.g., computational photography, surveillance, autonomous vehicles, and remote sensing. Significant advances in image restoration have been made in recent years, dominated by convolutional neural networks (CNNs). The widely-used CNN-based methods typically operate either on full-resolution or on progressively low-resolution representations. In the former case, spatial details are preserved but the contextual information cannot be precisely encoded. In the latter case, generated outputs are semantically reliable but spatially less accurate. This paper presents a new architecture with a holistic goal of maintaining spatially-precise high-resolution representations through the entire network, and receiving complementary contextual information from the low-resolution representations. The core of our approach is a multi-scale residual block containing the following key elements: (a) parallel multi-resolution convolution streams for extracting multi-scale features, (b) information exchange across the multi-resolution streams, (c) non-local attention mechanism for capturing contextual information, and (d) attention based multi-scale feature aggregation. Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details. Extensive experiments on six real image benchmark datasets demonstrate that our method, named as MIRNet-v2 , achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement. The source code and pre-trained models are available at https://github.com/swz30/MIRNetv2

下载PDF全文

下载文献需遵守相关版权规定

论文标题