通过视觉变压器的细节引导的指纹嵌入

论文标题

通过视觉变压器的细节引导的指纹嵌入

Minutiae-Guided Fingerprint Embeddings via Vision Transformers

论文作者

Grosz, Steven A., Engelsma, Joshua J., Ranjan, Rajeev, Ramakrishnan, Naveen, Aggarwal, Manoj, Medioni, Gerard G., Jain, Anil K.

论文摘要

长期以来，细节匹配一直占据了指纹识别领域。但是，深网可用于从指纹提取固定长度的嵌入。迄今为止，探索CNN体系结构提取此类嵌入的少数研究表现出了极大的希望。受这些早期作品的启发，我们提议首次使用视觉变压器（VIT）来学习辨别固定长度的指纹嵌入。我们进一步证明，通过指导VIT专注于本地，微小的相关功能，我们可以提高识别性能。最后，我们表明，通过融合CNN和VIT学到的嵌入，我们可以与商业最先进的（SOTA）匹配器达成平等。特别是，与获得TAR = 96.71％ @ far = 0.1％的SOTA商业匹配器相比，NIST SD 302公共域数据集获得了TAR = 94.23％ @ far = 0.1％。此外，我们的固定长度嵌入可以比商业系统快的数量级匹配（250万匹配/秒，而50k匹配/秒/秒）。我们将代码和模型公开可用，以鼓励对此主题进行进一步研究：https：//github.com/tba。

Minutiae matching has long dominated the field of fingerprint recognition. However, deep networks can be used to extract fixed-length embeddings from fingerprints. To date, the few studies that have explored the use of CNN architectures to extract such embeddings have shown extreme promise. Inspired by these early works, we propose the first use of a Vision Transformer (ViT) to learn a discriminative fixed-length fingerprint embedding. We further demonstrate that by guiding the ViT to focus in on local, minutiae related features, we can boost the recognition performance. Finally, we show that by fusing embeddings learned by CNNs and ViTs we can reach near parity with a commercial state-of-the-art (SOTA) matcher. In particular, we obtain a TAR=94.23% @ FAR=0.1% on the NIST SD 302 public-domain dataset, compared to a SOTA commercial matcher which obtains TAR=96.71% @ FAR=0.1%. Additionally, our fixed-length embeddings can be matched orders of magnitude faster than the commercial system (2.5 million matches/second compared to 50K matches/second). We make our code and models publicly available to encourage further research on this topic: https://github.com/tba.

下载PDF全文

下载文献需遵守相关版权规定

论文标题