一个有效的深层网络，用于无关键的头部姿势估算

论文标题

一个有效的深层网络，用于无关键的头部姿势估算

An Effective Deep Network for Head Pose Estimation without Keypoints

论文作者

Thai, Chien, Tran, Viet, Bui, Minh, Ninh, Huong, Tran, Hai

论文摘要

近年来，人头姿势估计是面部分析中的一个重要问题，它具有许多计算机视觉应用，例如凝视估计，虚拟现实和驾驶员帮助。由于头部构成估计问题的重要性，有必要设计一个紧凑的模型来解决此任务，以便在部署基于面部分析的应用程序（例如大型摄像机监视系统）时降低计算成本，同时保持准确性。在这项工作中，我们提出了一个轻巧的模型，该模型有效地解决了头部姿势估计问题。我们的方法有两个主要步骤。 1）我们首先在合成数据集上训练许多教师模型-300W -LPA以获取头部姿势伪标签。 2）我们使用RESNET18骨干设计建筑，并通过知识蒸馏过程通过这些伪标签的集合来训练我们提出的模型。为了评估我们的模型的有效性，我们使用AFLW-2000和BIWI-两个现实世界姿势数据集。实验结果表明，与最先进的头部姿势估计方法相比，我们提出的模型可显着提高准确性。此外，在推断特斯拉V100时，我们的型号的实时速度为$ \ sim $ 300 fps。

Human head pose estimation is an essential problem in facial analysis in recent years that has a lot of computer vision applications such as gaze estimation, virtual reality, and driver assistance. Because of the importance of the head pose estimation problem, it is necessary to design a compact model to resolve this task in order to reduce the computational cost when deploying on facial analysis-based applications such as large camera surveillance systems, AI cameras while maintaining accuracy. In this work, we propose a lightweight model that effectively addresses the head pose estimation problem. Our approach has two main steps. 1) We first train many teacher models on the synthesis dataset - 300W-LPA to get the head pose pseudo labels. 2) We design an architecture with the ResNet18 backbone and train our proposed model with the ensemble of these pseudo labels via the knowledge distillation process. To evaluate the effectiveness of our model, we use AFLW-2000 and BIWI - two real-world head pose datasets. Experimental results show that our proposed model significantly improves the accuracy in comparison with the state-of-the-art head pose estimation methods. Furthermore, our model has the real-time speed of $\sim$300 FPS when inferring on Tesla V100.

下载PDF全文

下载文献需遵守相关版权规定

论文标题