论文标题
样式编码:用于图像到图像翻译的stylegan编码器
Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation
论文作者
论文摘要
我们提出一个通用的图像到图像翻译框架,Pixel2style2pixel(PSP)。我们的PSP框架基于一个新颖的编码网络,该网络直接生成一系列样式向量,这些矢量被馈入预审计的样式Generator,形成了扩展的W+潜在空间。我们首先表明我们的编码器可以将真实图像直接嵌入W+,而没有其他优化。接下来,我们建议利用编码器直接求解图像到图像转换任务,将其定义为从某些输入域中编码问题到潜在域的编码问题。通过首先偏离标准倒置,编辑以前的stylegan编码器使用的以后方法,即使在stylegan域中没有表示输入映像,我们的方法也可以处理各种任务。我们表明,通过stylegan解决翻译任务可以大大简化训练过程,因为不需要对手,因此可以更好地支持在没有像素到像素通信的情况下解决任务,并固有地通过样式的重新采样来支持多模式的合成。最后,我们在各种面部图像到图像翻译任务上展示了我们的框架的潜力,即使与专门为单个任务设计的最新解决方案相比,即使是相比,也可以进一步表明它可以扩展到人的面部域之外。
We present a generic image-to-image translation framework, pixel2style2pixel (pSp). Our pSp framework is based on a novel encoder network that directly generates a series of style vectors which are fed into a pretrained StyleGAN generator, forming the extended W+ latent space. We first show that our encoder can directly embed real images into W+, with no additional optimization. Next, we propose utilizing our encoder to directly solve image-to-image translation tasks, defining them as encoding problems from some input domain into the latent domain. By deviating from the standard invert first, edit later methodology used with previous StyleGAN encoders, our approach can handle a variety of tasks even when the input image is not represented in the StyleGAN domain. We show that solving translation tasks through StyleGAN significantly simplifies the training process, as no adversary is required, has better support for solving tasks without pixel-to-pixel correspondence, and inherently supports multi-modal synthesis via the resampling of styles. Finally, we demonstrate the potential of our framework on a variety of facial image-to-image translation tasks, even when compared to state-of-the-art solutions designed specifically for a single task, and further show that it can be extended beyond the human facial domain.