论文标题

建模人类视觉搜索:贝叶斯搜索器和显着图的组合方法在自然场景中进行眼动指南

Modeling human visual search: A combined Bayesian searcher and saliency map approach for eye movement guidance in natural scenes

论文作者

Sclar, M., Bujia, G., Vita, S., Solovey, G., Kamienkowski, J. E.

论文摘要

寻找对象对于几乎所有日常生活的视觉任务都是必不可少的。显着性模型对于预测自然图像中的固定位置非常有用,但是静态的,即它们没有提供有关固定时间序列的信息。如今,该领域最大的挑战之一是超越显着性图,以预测与视觉任务相关的一系列固定序列,例如搜索给定的目标。已经为此任务提出了贝叶斯观察者模型,因为它们代表视觉搜索作为主动采样过程。然而,它们主要在人造图像上进行评估,并且它们如何适应自然图像仍然没有探索。 在这里,我们提出了一个统一的贝叶斯模型,用于视觉搜索,以显着图作为先验信息为指导。我们通过自然场景记录眼动动作的视觉搜索实验验证了我们的模型。我们表明,尽管最新的显着性模型在预测视觉搜索任务中的前两个固定方面表现良好,但它们的性能会降低为偶然的机会。这表明,仅显着图可以建模自下而上的第一印象,但在自上而下的任务信息至关重要时不足以解释扫描路径。因此,我们建议将它们用作贝叶斯搜索者的先验。对于整个扫描路径,这种方法导致与人类非常相似的行为,这是在目标界定的函数的目标百分比和扫描路径相似性的函数中,从而再现了整个眼动运动的顺序。

Finding objects is essential for almost any daily-life visual task. Saliency models have been useful to predict fixation locations in natural images, but are static, i.e., they provide no information about the time-sequence of fixations. Nowadays, one of the biggest challenges in the field is to go beyond saliency maps to predict a sequence of fixations related to a visual task, such as searching for a given target. Bayesian observer models have been proposed for this task, as they represent visual search as an active sampling process. Nevertheless, they were mostly evaluated on artificial images, and how they adapt to natural images remains largely unexplored. Here, we propose a unified Bayesian model for visual search guided by saliency maps as prior information. We validated our model with a visual search experiment in natural scenes recording eye movements. We show that, although state-of-the-art saliency models perform well in predicting the first two fixations in a visual search task, their performance degrades to chance afterward. This suggests that saliency maps alone are good to model bottom-up first impressions, but are not enough to explain the scanpaths when top-down task information is critical. Thus, we propose to use them as priors of Bayesian searchers. This approach leads to a behavior very similar to humans for the whole scanpath, both in the percentage of target found as a function of the fixation rank and the scanpath similarity, reproducing the entire sequence of eye movements.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源