使用多个变焦标度

论文标题

使用多个变焦标度

360-Degree Gaze Estimation in the Wild Using Multiple Zoom Scales

论文作者

Ashesh, Chen, Chu-Song, Lin, Hsuan-Tien

论文摘要

凝视估计涉及预测该人在图像或视频中在哪里看。从技术上讲，目光信息可以从两个不同的放大程度来推断：面部方向和眼睛方向。鉴于在极端/右目的或遮挡等条件下缺乏清晰的眼睛斑块，对于野外凝视的估计并不总是可行的。在这项工作中，我们设计了一个模型，该模型模仿了人类通过从聚焦外观中汇总的能力来估计目光的能力，每个外观都处于面部区域的不同放大范围内。该模型避免了需要提取清晰的眼斑，同时又解决了面部尺度变化的另一个重要问题，以进行野外的凝视估计。我们进一步扩展了模型，以通过编码极性表示的后向凝视以及强大的平均方案来处理360度凝视估计的具有挑战性的任务。 ETH-XGAZE数据集的实验结果不包含尺度相变的面孔，证明了该模型从多个尺度中吸收信息的有效性。对于其他具有许多规模变化面（Gaze360和RT-Gene）的基准数据集，建议的模型在使用图像或视频时就可以实现最新的凝视性能。可以在https://github.com/ashesh-0/multizoomgaze上访问我们的代码和预估计的模型。

Gaze estimation involves predicting where the person is looking at within an image or video. Technically, the gaze information can be inferred from two different magnification levels: face orientation and eye orientation. The inference is not always feasible for gaze estimation in the wild, given the lack of clear eye patches in conditions like extreme left/right gazes or occlusions. In this work, we design a model that mimics humans' ability to estimate the gaze by aggregating from focused looks, each at a different magnification level of the face area. The model avoids the need to extract clear eye patches and at the same time addresses another important issue of face-scale variation for gaze estimation in the wild. We further extend the model to handle the challenging task of 360-degree gaze estimation by encoding the backward gazes in the polar representation along with a robust averaging scheme. Experiment results on the ETH-XGaze dataset, which does not contain scale-varying faces, demonstrate the model's effectiveness to assimilate information from multiple scales. For other benchmark datasets with many scale-varying faces (Gaze360 and RT-GENE), the proposed model achieves state-of-the-art performance for gaze estimation when using either images or videos. Our code and pretrained models can be accessed at https://github.com/ashesh-0/MultiZoomGaze.

下载PDF全文

下载文献需遵守相关版权规定

论文标题