diffstyler：文本驱动图像样式化的可控双扩散

论文标题

diffstyler：文本驱动图像样式化的可控双扩散

DiffStyler: Controllable Dual Diffusion for Text-Driven Image Stylization

论文作者

Huang, Nisha, Zhang, Yuxin, Tang, Fan, Ma, Chongyang, Huang, Haibin, Zhang, Yong, Dong, Weiming, Xu, Changsheng

论文摘要

尽管任意图像引导样式转移方法的令人印象深刻的结果，但最近提出了文本驱动的图像样式化，以根据用户提供的目标样式的文本描述将自然图像转移到风格化的图像中。与以前的图像到图像传输方法不同，文本指导的风格化进度为用户提供了一种更精确，直观的方式来表达所需的样式。但是，跨模式输入/输出之间的巨大差异使得在典型的进纸前CNN管道中进行文本驱动的图像样式化变得具有挑战性。在本文中，我们提出了Diffstyler，这是一种双重扩散处理架构，以控制扩散结果的内容和样式之间的平衡。在扩散过程中，可以轻松地将跨模式样式信息作为指导集成。此外，我们提出了一个基于内容图像的可学习噪声，在该噪声上，反向DeNoisis的过程基于，从而可以更好地保留内容图像的结构信息。我们通过广泛的定性和定量实验来验证所提出的差异法。代码可在\ url {https://github.com/haha-lisa/diffstyler}中找到。

Despite the impressive results of arbitrary image-guided style transfer methods, text-driven image stylization has recently been proposed for transferring a natural image into a stylized one according to textual descriptions of the target style provided by the user. Unlike the previous image-to-image transfer approaches, text-guided stylization progress provides users with a more precise and intuitive way to express the desired style. However, the huge discrepancy between cross-modal inputs/outputs makes it challenging to conduct text-driven image stylization in a typical feed-forward CNN pipeline. In this paper, we present DiffStyler, a dual diffusion processing architecture to control the balance between the content and style of the diffused results. The cross-modal style information can be easily integrated as guidance during the diffusion process step-by-step. Furthermore, we propose a content image-based learnable noise on which the reverse denoising process is based, enabling the stylization results to better preserve the structure information of the content image. We validate the proposed DiffStyler beyond the baseline methods through extensive qualitative and quantitative experiments. Code is available at \url{https://github.com/haha-lisa/Diffstyler}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题