分层动态图像协调

论文标题

分层动态图像协调

Hierarchical Dynamic Image Harmonization

论文作者

Chen, Haoxing, Gu, Zhangxuan, Li, Yaohui, Lan, Jun, Meng, Changhua, Wang, Weiqiang, Li, Huaxiong

论文摘要

图像协调是计算机视觉中的关键任务，该任务旨在调整前景以使其与背景兼容。最近的工作主要集中于使用全局变换（即标准化和颜色曲线渲染）来实现视觉一致性。但是，这些模型忽略了本地视觉一致性，其巨大的模型尺寸限制了它们在边缘设备上的协调能力。在本文中，我们提出了一个层次动态网络（HDNET），以适应从本地视图到全局视图，以在有效的图像协调中更好地特征转换。受各种动态模型成功的启发，本文提出了局部动力学（LD）模块和面罩感知的全局动态（MGD）模块。具体而言，LD基于语义相似性匹配前景和背景区域之间的本地表示，然后根据其$ K $ neart最近的邻居背景区域的外观自适应地调整每个前景局部表示。这样，LD可以在更细粒度的水平上产生更逼真的图像，并同时享受语义一致性的特征。 MGD有效地将独特的卷积应用于前景和背景，学习前景和背景区域的表示及其与全球协调的相关性，从而更有效地促进了图像的局部视觉一致性。实验结果表明，与以前的方法相比，所提出的HDNET显着将总模型参数显着降低了80 \％，同时仍在流行的IHARMONY4数据集中获得最先进的性能。值得注意的是，与先前的最新方法相比，HDNET在PSNR方面的提高4 \％，MSE减少了19 \％。

Image harmonization is a critical task in computer vision, which aims to adjust the foreground to make it compatible with the background. Recent works mainly focus on using global transformations (i.e., normalization and color curve rendering) to achieve visual consistency. However, these models ignore local visual consistency and their huge model sizes limit their harmonization ability on edge devices. In this paper, we propose a hierarchical dynamic network (HDNet) to adapt features from local to global view for better feature transformation in efficient image harmonization. Inspired by the success of various dynamic models, local dynamic (LD) module and mask-aware global dynamic (MGD) module are proposed in this paper. Specifically, LD matches local representations between the foreground and background regions based on semantic similarities, then adaptively adjust every foreground local representation according to the appearance of its $K$-nearest neighbor background regions. In this way, LD can produce more realistic images at a more fine-grained level, and simultaneously enjoy the characteristic of semantic alignment. The MGD effectively applies distinct convolution to the foreground and background, learning the representations of foreground and background regions as well as their correlations to the global harmonization, facilitating local visual consistency for the images much more efficiently. Experimental results demonstrate that the proposed HDNet significantly reduces the total model parameters by more than 80\% compared to previous methods, while still attaining state-of-the-art performance on the popular iHarmony4 dataset. Notably, the HDNet achieves a 4\% improvement in PSNR and a 19\% reduction in MSE compared to the prior state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题