论文标题

通过剪辑指导学习的多模式假新闻检测

Multimodal Fake News Detection via CLIP-Guided Learning

论文作者

Zhou, Yangming, Ying, Qichao, Qian, Zhenxing, Li, Sheng, Zhang, Xinpeng

论文摘要

多模式的假新闻检测吸引了许多社会取证的研究兴趣。许多现有的方法引入了量身定制的注意机制,以指导单峰特征的融合。但是,如何计算这些功能的相似性以及它将如何影响FND中的决策过程仍然是开放的问题。此外,假新闻检测中经过验证的多模式特征学习模型的潜力尚未得到充分利用。本文提出了一个FND-CLIP框架,即基于对比性语言图像预处理(剪辑)的多模式假新闻检测网络。给定有针对性的多模式新闻,我们使用基于Resnet的编码器,基于BERT的编码器和两个配对剪辑编码器从图像和文本中提取深度表示形式。多模式特征是夹层生成的特征的串联,该特征由两种模式的标准化跨模式相似性加权。进一步处理提取的特征以减少冗余,然后再将其喂入最终分类器。我们介绍了一个态度的注意模块,以适应重新体重和汇总功能。我们已经对典型的假新闻数据集进行了广泛的实验。结果表明,所提出的框架在采矿至关重要的特征以进行假新闻检测方面具有更好的能力。拟议的FND-CLIP可以比以前的作品获得更好的性能,即分别在微博,Politifact和Gossipcop的总体准确性上提高0.7 \%,6.8 \%和1.3 \%。此外,我们证明,基于夹的学习可以使多模式特征选择更好地灵活性。

Multimodal fake news detection has attracted many research interests in social forensics. Many existing approaches introduce tailored attention mechanisms to guide the fusion of unimodal features. However, how the similarity of these features is calculated and how it will affect the decision-making process in FND are still open questions. Besides, the potential of pretrained multi-modal feature learning models in fake news detection has not been well exploited. This paper proposes a FND-CLIP framework, i.e., a multimodal Fake News Detection network based on Contrastive Language-Image Pretraining (CLIP). Given a targeted multimodal news, we extract the deep representations from the image and text using a ResNet-based encoder, a BERT-based encoder and two pair-wise CLIP encoders. The multimodal feature is a concatenation of the CLIP-generated features weighted by the standardized cross-modal similarity of the two modalities. The extracted features are further processed for redundancy reduction before feeding them into the final classifier. We introduce a modality-wise attention module to adaptively reweight and aggregate the features. We have conducted extensive experiments on typical fake news datasets. The results indicate that the proposed framework has a better capability in mining crucial features for fake news detection. The proposed FND-CLIP can achieve better performances than previous works, i.e., 0.7\%, 6.8\% and 1.3\% improvements in overall accuracy on Weibo, Politifact and Gossipcop, respectively. Besides, we justify that CLIP-based learning can allow better flexibility on multimodal feature selection.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源