多通道深3D面部识别

论文标题

多通道深3D面部识别

Multi-channel Deep 3D Face Recognition

论文作者

You, Zhiqian, Yang, Tingting, Jin, Miao

论文摘要

在许多应用中，面部识别是其吞吐量，便利性和非侵入性的生物识别。深度卷积神经网络（CNN）体系结构的最新进展显着提高了基于二维（2D）面部纹理图像的面部识别性能，并使用常规方法优于先前的最新技术。但是，姿势，照明，化妆和表达的变化仍然挑战了2D面部识别的准确性。另一方面，三维（3D）面部数据中包含的几何信息有可能克服2D面部数据的基本局限性。我们提出了一个基于3D面部数据的多通道深3D面网络，以供面部识别。我们根据其分段线性的三角形网格结构来计算3D面的几何信息，然后将颜色从3D到2D平面以及颜色从3D到2D平面，以利用先进的深层CNN体系结构。我们修改网络的输入层以使用九个通道拍摄图像，而不是仅三个频道，以便可以将更多的几何信息明确馈送到它。我们使用来自vgg-face \ cite {parkhi2015}的图像进行预先培训网络，然后用生成的多通道面图像对其进行微调。多通道深3D面板网络的面部识别精度已达到98.6。实验结果还清楚地表明，与正交投影相比，基于共形图将9通道图像扁平到平面时，网络的性能要好得多。

Face recognition has been of great importance in many applications as a biometric for its throughput, convenience, and non-invasiveness. Recent advancements in deep Convolutional Neural Network (CNN) architectures have boosted significantly the performance of face recognition based on two-dimensional (2D) facial texture images and outperformed the previous state of the art using conventional methods. However, the accuracy of 2D face recognition is still challenged by the change of pose, illumination, make-up, and expression. On the other hand, the geometric information contained in three-dimensional (3D) face data has the potential to overcome the fundamental limitations of 2D face data. We propose a multi-Channel deep 3D face network for face recognition based on 3D face data. We compute the geometric information of a 3D face based on its piecewise-linear triangular mesh structure and then conformally flatten geometric information along with the color from 3D to 2D plane to leverage the state-of-the-art deep CNN architectures. We modify the input layer of the network to take images with nine channels instead of three only such that more geometric information can be explicitly fed to it. We pre-train the network using images from the VGG-Face \cite{Parkhi2015} and then fine-tune it with the generated multi-channel face images. The face recognition accuracy of the multi-Channel deep 3D face network has achieved 98.6. The experimental results also clearly show that the network performs much better when a 9-channel image is flattened to plane based on the conformal map compared with the orthographic projection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题