论文标题
探索剪辑以评估图像的外观和感觉
Exploring CLIP for Assessing the Look and Feel of Images
论文作者
论文摘要
测量视觉内容的感知是计算机视觉中的一个长期问题。已经开发了许多数学模型来评估图像的外观或质量。尽管此类工具在量化诸如噪声和模糊水平之类的降解方面具有有效性,但这种定量与人类语言松散相结合。当涉及到对视觉内容感觉的更抽象的看法时,现有方法只能依靠受监督的模型,这些模型经过通过费力的用户研究收集的标记数据进行了明确培训。在本文中,我们通过探索在对比的语言图像预训练(剪辑)模型中探索丰富的视觉语言,超越了传统的范例,以评估零拍的方式对图像的质量感知(外观)和抽象感知(感觉)。特别是,我们讨论有效的及时设计,并展示有效的及时配对策略来利用先验。我们还提供了对受控数据集和图像质量评估(IQA)基准测试的广泛实验。我们的结果表明,剪辑捕获了有意义的先验,可以很好地推广到不同的感知评估。代码可在https://github.com/iceclear/clip-iqa上提供可用。
Measuring the perception of visual content is a long-standing problem in computer vision. Many mathematical models have been developed to evaluate the look or quality of an image. Despite the effectiveness of such tools in quantifying degradations such as noise and blurriness levels, such quantification is loosely coupled with human language. When it comes to more abstract perception about the feel of visual content, existing methods can only rely on supervised models that are explicitly trained with labeled data collected via laborious user study. In this paper, we go beyond the conventional paradigms by exploring the rich visual language prior encapsulated in Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner. In particular, we discuss effective prompt designs and show an effective prompt pairing strategy to harness the prior. We also provide extensive experiments on controlled datasets and Image Quality Assessment (IQA) benchmarks. Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments. Code is avaliable at https://github.com/IceClear/CLIP-IQA.