论文标题
朝着自我监督的目光估计
Towards Self-Supervised Gaze Estimation
论文作者
论文摘要
最近基于联合嵌入的自我监督方法已经超过了各种图像识别任务(例如图像分类)的标准监督方法。这些自我监督的方法旨在最大化从同一图像的两个不同转换视图中提取的特征之间的一致性,从而导致学习有关外观和几何图像变换的不变表示。但是,在凝视估计的背景下,这些方法的有效性尚不清楚,这是一项结构化回归任务,需要在几何变换(例如旋转,水平翻转)下进行均衡。在这项工作中,我们提出了SWAT,这是基于在线聚类的自学方法SWAV的模棱两可的版本,以了解凝视估算的更多信息表示。我们证明了SWAT具有Resnet-50,并得到未经保证的未标记的面部图像的支持,在各种实验中都优于最先进的凝视估计方法和受监督的基线。特别是,我们在现有基准(Eth-Xgaze,Gaze360和Mpiifacegaze)上的跨数据库和数据库内评估任务方面提高了57%和25%。
Recent joint embedding-based self-supervised methods have surpassed standard supervised approaches on various image recognition tasks such as image classification. These self-supervised methods aim at maximizing agreement between features extracted from two differently transformed views of the same image, which results in learning an invariant representation with respect to appearance and geometric image transformations. However, the effectiveness of these approaches remains unclear in the context of gaze estimation, a structured regression task that requires equivariance under geometric transformations (e.g., rotations, horizontal flip). In this work, we propose SwAT, an equivariant version of the online clustering-based self-supervised approach SwAV, to learn more informative representations for gaze estimation. We demonstrate that SwAT, with ResNet-50 and supported with uncurated unlabeled face images, outperforms state-of-the-art gaze estimation methods and supervised baselines in various experiments. In particular, we achieve up to 57% and 25% improvements in cross-dataset and within-dataset evaluation tasks on existing benchmarks (ETH-XGaze, Gaze360, and MPIIFaceGaze).