DE-NET：动态文本引导的图像编辑对抗网络

论文标题

DE-NET：动态文本引导的图像编辑对抗网络

DE-Net: Dynamic Text-guided Image Editing Adversarial Networks

论文作者

Tao, Ming, Bao, Bing-Kun, Tang, Hao, Wu, Fei, Wei, Longhui, Tian, Qi

论文摘要

文本指导的图像编辑模型显示出了显着的结果。但是，还有两个问题。首先，他们采用固定的操纵模块来满足各种编辑要求（例如，更改颜色，纹理更改，内容添加和删除），从而导致编辑过度编辑或不足。其次，他们没有清楚地区分文本所需的部分和文本含量，从而导致编辑不准确。为了解决这些局限性，我们提出：（i）动态编辑块（DEBLOCK），该块（DeBlock）以各种编辑要求动态组成不同的编辑模块。（ii）一个组成预测变量（COMP-PRED），该预测指标根据目标文本和源图像的推断来预测deBlock的组成权重。（iii）动态文本自适应卷积块（dcblock），该块（dcblock）查询源图像特征，以区分文本需要的零件和文本 - irrelevant零件。广泛的实验表明，我们的DE-NET可实现出色的性能，并更正确，准确地操纵源图像。代码可在\ url {https://github.com/tobran/de-net}中获得。

Text-guided image editing models have shown remarkable results. However, there remain two problems. First, they employ fixed manipulation modules for various editing requirements (e.g., color changing, texture changing, content adding and removing), which results in over-editing or insufficient editing. Second, they do not clearly distinguish between text-required and text-irrelevant parts, which leads to inaccurate editing. To solve these limitations, we propose: (i) a Dynamic Editing Block (DEBlock) which composes different editing modules dynamically for various editing requirements. (ii) a Composition Predictor (Comp-Pred) which predicts the composition weights for DEBlock according to the inference on target texts and source images. (iii) a Dynamic text-adaptive Convolution Block (DCBlock) which queries source image features to distinguish text-required parts and text-irrelevant parts. Extensive experiments demonstrate that our DE-Net achieves excellent performance and manipulates source images more correctly and accurately. Code is available at \url{https://github.com/tobran/DE-Net}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题