基于自动编码器的无监督图像内容转移的新兴分解

论文标题

基于自动编码器的无监督图像内容转移的新兴分解

Emerging Disentanglement in Auto-Encoder Based Unsupervised Image Content Transfer

论文作者

Press, Ori, Galanti, Tomer, Benaim, Sagie, Wolf, Lior

论文摘要

我们研究了域A和B之间以无监督的方式学习映射的问题，以使B中的样本B包含A中A中存在的所有信息以及A中的其他信息。例如，忽略闭塞，B可以是戴眼镜，没有眼镜的人，而眼镜将是附加信息。当将样品A从第一个域映射到另一个域时，丢失的信息将从B中的独立参考样本B复制。因此，在上面的示例中，我们可以为每个没有眼镜的人创建带有任何面部图像中的眼镜的人。我们的解决方案对两个域采用单个两条路径编码器和一个解码器。两个域的常见部分和单独的部分被编码为两个向量，并且单独的部分固定在域A的零A时。损耗项是最小的，并且涉及两个域和一个域混淆项的重建损失。我们的分析表明，在温和的假设下，这种体系结构比文献指导的翻译方法要简单得多，足以确保两个领域之间的分离。我们在一些视觉域中提出了令人信服的结果，例如玻璃杯上的无玻璃，根据参考图像添加面部头发等。

We study the problem of learning to map, in an unsupervised way, between domains A and B, such that the samples b in B contain all the information that exists in samples a in A and some additional information. For example, ignoring occlusions, B can be people with glasses, A people without, and the glasses, would be the added information. When mapping a sample a from the first domain to the other domain, the missing information is replicated from an independent reference sample b in B. Thus, in the above example, we can create, for every person without glasses a version with the glasses observed in any face image. Our solution employs a single two-pathway encoder and a single decoder for both domains. The common part of the two domains and the separate part are encoded as two vectors, and the separate part is fixed at zero for domain A. The loss terms are minimal and involve reconstruction losses for the two domains and a domain confusion term. Our analysis shows that under mild assumptions, this architecture, which is much simpler than the literature guided-translation methods, is enough to ensure disentanglement between the two domains. We present convincing results in a few visual domains, such as no-glasses to glasses, adding facial hair based on a reference image, etc.

下载PDF全文

下载文献需遵守相关版权规定

论文标题