视觉变压器和CNN的跨性行为分析，用于深泡图像检测

论文标题

视觉变压器和CNN的跨性行为分析，用于深泡图像检测

Cross-Forgery Analysis of Vision Transformers and CNNs for Deepfake Image Detection

论文作者

Coccomini, Davide Alessandro, Caldelli, Roberto, Falchi, Fabrizio, Gennaro, Claudio, Amato, Giuseppe

论文摘要

深层生成技术正在快速发展，使创建现实的操纵图像和视频并危及现代社会的宁静成为可能。新技术的不断出现带来了一个要面对的另一个问题，即，DeepFake检测模型及时更新自己的能力，以便能够使用最新方法识别进行的操作。这是一个非常复杂的问题，因为训练一个模型需要大量数据，如果深层生成方法过于最近，这将很难获得。此外，不断地重新训练网络是不可行的。在本文中，我们问自己，在各种深度学习技术中，是否有一个能够概括深层概念的概念，以至于它与训练集中使用的一种或多种或多种特定的深层生成方法无关。我们将视觉变压器与基于伪造网络数据集的跨性别环境中的有效NETV2进行了比较。从我们的实验中，有效的NETV2具有更大的专业趋势，通常会在训练方法上获得更好的结果，而视觉变压器具有卓越的概括能力，甚至可以在新方法生成的图像上具有更高的能力。

Deepfake Generation Techniques are evolving at a rapid pace, making it possible to create realistic manipulated images and videos and endangering the serenity of modern society. The continual emergence of new and varied techniques brings with it a further problem to be faced, namely the ability of deepfake detection models to update themselves promptly in order to be able to identify manipulations carried out using even the most recent methods. This is an extremely complex problem to solve, as training a model requires large amounts of data, which are difficult to obtain if the deepfake generation method is too recent. Moreover, continuously retraining a network would be unfeasible. In this paper, we ask ourselves if, among the various deep learning techniques, there is one that is able to generalise the concept of deepfake to such an extent that it does not remain tied to one or more specific deepfake generation methods used in the training set. We compared a Vision Transformer with an EfficientNetV2 on a cross-forgery context based on the ForgeryNet dataset. From our experiments, It emerges that EfficientNetV2 has a greater tendency to specialize often obtaining better results on training methods while Vision Transformers exhibit a superior generalization ability that makes them more competent even on images generated with new methodologies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题