MagicMix：语义混合与扩散模型

论文标题

MagicMix：语义混合与扩散模型

MagicMix: Semantic Mixing with Diffusion Models

论文作者

Liew, Jun Hao, Yan, Hanshu, Zhou, Daquan, Feng, Jiashi

论文摘要

您是否曾经想象过类似的咖啡机或类似老虎的兔子的样子？在这项工作中，我们试图通过探索一个称为语义混合的新任务来回答这些问题，旨在将两种不同的语义融合以创建一个新概念（例如，Corgi + Coffee Machine-> Corgi-like Coffee Machine）。与样式转移不同，该样式转移根据参考样式对图像进行了风格，而不更改图像内容，语义混合以语义方式将两个不同的概念混合在一起，以合成一个新颖的概念，同时保留空间布局和几何形状。为此，我们提出了MagicMix，这是一种基于预先训练的文本条件扩散模型的简单而有效的解决方案。由扩散模型的渐进生成属性的促进，其中布局/形状在早期的denoing步骤中出现，而语义上有意义的细节在denoising过程中的较晚步骤中出现在较晚的步骤中，我们的方法首先获得了粗糙的布局（通过损坏图像或从纯高斯噪声中损坏纯高斯噪声，给定文本提示），随后是有条件提示的语义混合的条件提示。我们的方法不需要任何空间掩码或重新训练，但能够以高保真度合成新颖对象。为了提高混合质量，我们进一步制定了两种简单的策略，以更好地控制合成内容。通过我们的方法，我们在各种下游应用程序上介绍了结果，包括语义样式转移，新颖的对象合成，繁殖混合和概念的去除，证明了我们方法的灵活性。可以在项目页面https://magicmix.github.io上找到更多结果。

Have you ever imagined what a corgi-alike coffee machine or a tiger-alike rabbit would look like? In this work, we attempt to answer these questions by exploring a new task called semantic mixing, aiming at blending two different semantics to create a new concept (e.g., corgi + coffee machine -- > corgi-alike coffee machine). Unlike style transfer, where an image is stylized according to the reference style without changing the image content, semantic blending mixes two different concepts in a semantic manner to synthesize a novel concept while preserving the spatial layout and geometry. To this end, we present MagicMix, a simple yet effective solution based on pre-trained text-conditioned diffusion models. Motivated by the progressive generation property of diffusion models where layout/shape emerges at early denoising steps while semantically meaningful details appear at later steps during the denoising process, our method first obtains a coarse layout (either by corrupting an image or denoising from a pure Gaussian noise given a text prompt), followed by injection of conditional prompt for semantic mixing. Our method does not require any spatial mask or re-training, yet is able to synthesize novel objects with high fidelity. To improve the mixing quality, we further devise two simple strategies to provide better control and flexibility over the synthesized content. With our method, we present our results over diverse downstream applications, including semantic style transfer, novel object synthesis, breed mixing, and concept removal, demonstrating the flexibility of our method. More results can be found on the project page https://magicmix.github.io

下载PDF全文

下载文献需遵守相关版权规定

论文标题