论文标题
使用多个变焦标度
360-Degree Gaze Estimation in the Wild Using Multiple Zoom Scales
论文作者
论文摘要
凝视估计涉及预测该人在图像或视频中在哪里看。从技术上讲,目光信息可以从两个不同的放大程度来推断:面部方向和眼睛方向。鉴于在极端/右目的或遮挡等条件下缺乏清晰的眼睛斑块,对于野外凝视的估计并不总是可行的。在这项工作中,我们设计了一个模型,该模型模仿了人类通过从聚焦外观中汇总的能力来估计目光的能力,每个外观都处于面部区域的不同放大范围内。该模型避免了需要提取清晰的眼斑,同时又解决了面部尺度变化的另一个重要问题,以进行野外的凝视估计。我们进一步扩展了模型,以通过编码极性表示的后向凝视以及强大的平均方案来处理360度凝视估计的具有挑战性的任务。 ETH-XGAZE数据集的实验结果不包含尺度相变的面孔,证明了该模型从多个尺度中吸收信息的有效性。对于其他具有许多规模变化面(Gaze360和RT-Gene)的基准数据集,建议的模型在使用图像或视频时就可以实现最新的凝视性能。可以在https://github.com/ashesh-0/multizoomgaze上访问我们的代码和预估计的模型。
Gaze estimation involves predicting where the person is looking at within an image or video. Technically, the gaze information can be inferred from two different magnification levels: face orientation and eye orientation. The inference is not always feasible for gaze estimation in the wild, given the lack of clear eye patches in conditions like extreme left/right gazes or occlusions. In this work, we design a model that mimics humans' ability to estimate the gaze by aggregating from focused looks, each at a different magnification level of the face area. The model avoids the need to extract clear eye patches and at the same time addresses another important issue of face-scale variation for gaze estimation in the wild. We further extend the model to handle the challenging task of 360-degree gaze estimation by encoding the backward gazes in the polar representation along with a robust averaging scheme. Experiment results on the ETH-XGaze dataset, which does not contain scale-varying faces, demonstrate the model's effectiveness to assimilate information from multiple scales. For other benchmark datasets with many scale-varying faces (Gaze360 and RT-GENE), the proposed model achieves state-of-the-art performance for gaze estimation when using either images or videos. Our code and pretrained models can be accessed at https://github.com/ashesh-0/MultiZoomGaze.