面部金字塔视觉变压器

论文标题

面部金字塔视觉变压器

Face Pyramid Vision Transformer

论文作者

Islam, Khawar, Zaheer, Muhammad Zaigham, Mahmood, Arif

论文摘要

提出了一种新颖的面部金字塔视觉变压器（FPVT），以学习判别性的多尺度面部表现形式，以进行面部识别和验证。在FPVT中，采用面部空间减少注意力（FSRA）和维度降低（FDR）层来使特征地图紧凑，从而减少了计算。提出了一种改进的贴片嵌入（IPE）算法，以利用CNN在VIT（例如共享权重，局部环境和接受场）中的好处，以模拟低级边缘到高级语义原始词。在FPVT框架内，提出了一个卷积进料网络（CFFN），该网络（CFFN）提取局部信息以学习低水平的面部信息。提出的FPVT在七个基准数据集上进行了评估，并将其与包括CNN，纯VIT和卷积VIT在内的十种现有最新方法进行了比较。尽管参数较少，但FPVT在比较方法上表现出了出色的性能。项目页面可从https://khawar-islam.github.io/fpvt/获得

A novel Face Pyramid Vision Transformer (FPVT) is proposed to learn a discriminative multi-scale facial representations for face recognition and verification. In FPVT, Face Spatial Reduction Attention (FSRA) and Dimensionality Reduction (FDR) layers are employed to make the feature maps compact, thus reducing the computations. An Improved Patch Embedding (IPE) algorithm is proposed to exploit the benefits of CNNs in ViTs (e.g., shared weights, local context, and receptive fields) to model lower-level edges to higher-level semantic primitives. Within FPVT framework, a Convolutional Feed-Forward Network (CFFN) is proposed that extracts locality information to learn low level facial information. The proposed FPVT is evaluated on seven benchmark datasets and compared with ten existing state-of-the-art methods, including CNNs, pure ViTs, and Convolutional ViTs. Despite fewer parameters, FPVT has demonstrated excellent performance over the compared methods. Project page is available at https://khawar-islam.github.io/fpvt/

下载PDF全文

下载文献需遵守相关版权规定

论文标题