剪贴画：细粒艺术分类的对比预训练

论文标题

剪贴画：细粒艺术分类的对比预训练

CLIP-Art: Contrastive Pre-training for Fine-Grained Art Classification

论文作者

Conde, Marcos V., Turgutlu, Kerem

论文摘要

艺术品中现有的计算机视觉研究与艺术品的细粒度属性识别以及由于其昂贵的创造而缺乏精选的注释数据集。据我们所知，我们是使用剪辑（对比性语言图像预训练）在各种艺术品图像和文本描述对上训练神经网络的第一个方法之一。剪辑可以直接从自由形式的艺术描述中学习，或者（如果有）策划的细粒标签。 Model的零拍功能允许为给定图像预测准确的自然语言描述，而无需直接对任务进行优化。我们的方法旨在解决2个挑战：实例检索和细粒度的艺术品属性识别。我们使用IMET数据集，我们认为该数据集是最大的注释艺术品数据集。在这个基准中，我们仅使用自学意识到实现竞争成果。

Existing computer vision research in artwork struggles with artwork's fine-grained attributes recognition and lack of curated annotated datasets due to their costly creation. To the best of our knowledge, we are one of the first methods to use CLIP (Contrastive Language-Image Pre-Training) to train a neural network on a variety of artwork images and text descriptions pairs. CLIP is able to learn directly from free-form art descriptions, or, if available, curated fine-grained labels. Model's zero-shot capability allows predicting accurate natural language description for a given image, without directly optimizing for the task. Our approach aims to solve 2 challenges: instance retrieval and fine-grained artwork attribute recognition. We use the iMet Dataset, which we consider the largest annotated artwork dataset. In this benchmark we achieved competitive results using only self-supervision.

下载PDF全文

下载文献需遵守相关版权规定

论文标题