使用条件卷积解码器从自然图像中合成类似人类的草图

论文标题

使用条件卷积解码器从自然图像中合成类似人类的草图

Synthesizing human-like sketches from natural images using a conditional convolutional decoder

论文作者

Kampelmühler, Moritz, Pinz, Axel

论文摘要

人类能够通过使用草图来精确地传达各种概念，这是一种高度降低和抽象的视觉内容表示。我们首次提出了一种完全卷积的端到端体系结构，能够在具有潜在混乱背景的自然图像中综合对象的类似人类的草图。为了使架构能够学习此高度抽象的映射，我们采用以下关键组件：（1）完全卷积的编码器解码器结构，（2）在抽象特征空间中运行的感知相似性损耗函数以及（3）解码器在对象标签上的解码器条件应绘制。鉴于这些建筑概念的结合，我们可以通过绘制图像对的端到端监督方式训练我们的结构。我们的体系结构生成的草图可以用85.6％的前5个精度进行分类，我们通过用户研究来验证它们的视觉质量。我们发现，深层特征是具有较大域间隙的感知相似性指标启用图像翻译，我们的发现进一步表明，经过图像分类任务训练的卷积神经网络隐含地学习编码形状信息。代码可在https://github.com/kampelmuehler/synthesizing_human_like_sketches下获得。

Humans are able to precisely communicate diverse concepts by employing sketches, a highly reduced and abstract shape based representation of visual content. We propose, for the first time, a fully convolutional end-to-end architecture that is able to synthesize human-like sketches of objects in natural images with potentially cluttered background. To enable an architecture to learn this highly abstract mapping, we employ the following key components: (1) a fully convolutional encoder-decoder structure, (2) a perceptual similarity loss function operating in an abstract feature space and (3) conditioning of the decoder on the label of the object that shall be sketched. Given the combination of these architectural concepts, we can train our structure in an end-to-end supervised fashion on a collection of sketch-image pairs. The generated sketches of our architecture can be classified with 85.6% Top-5 accuracy and we verify their visual quality via a user study. We find that deep features as a perceptual similarity metric enable image translation with large domain gaps and our findings further show that convolutional neural networks trained on image classification tasks implicitly learn to encode shape information. Code is available under https://github.com/kampelmuehler/synthesizing_human_like_sketches

下载PDF全文

下载文献需遵守相关版权规定

论文标题