论文标题
使用反向强化学习预测目标指导的注意力控制
Predicting Goal-directed Attention Control Using Inverse-Reinforcement Learning
论文作者
论文摘要
了解目标状态控制行为是通过机器学习新方法审问的成熟问题的问题。这些方法需要大型且标记的数据集来训练模型。为了注释一个具有观察到的搜索固定的大规模图像数据集,我们从搜索微波或时钟的人中收集了16,184个固定,在4,366张图像(MS-Coco)的数据集中收集了固定量。然后,我们使用这种行为宣布的数据集和逆增强学习的机器学习方法(IRL)来学习这两个目标目标的特定目标奖励功能和政策。最后,我们使用这些学习的策略来预测厨房场景的不相交测试数据集中的60个新行为搜索者(时钟= 30,微波= 30)的固定,描绘了微波炉和时钟(从而控制低级图像对比度的差异)。我们发现,IRL模型使用多个指标预测了行为搜索效率和固定密度图。此外,来自IRL模型的奖励图揭示了针对目标的模式,这些模式不仅暗示了目标特征的注意力指导,还暗示了按场景上下文(例如,在搜索时钟搜索时沿墙壁的固定)指导。使用机器学习和奖励的心理意义原则,可以学习目标指导注意力控制中使用的视觉特征。
Understanding how goal states control behavior is a question ripe for interrogation by new methods from machine learning. These methods require large and labeled datasets to train models. To annotate a large-scale image dataset with observed search fixations, we collected 16,184 fixations from people searching for either microwaves or clocks in a dataset of 4,366 images (MS-COCO). We then used this behaviorally-annotated dataset and the machine learning method of Inverse-Reinforcement Learning (IRL) to learn target-specific reward functions and policies for these two target goals. Finally, we used these learned policies to predict the fixations of 60 new behavioral searchers (clock = 30, microwave = 30) in a disjoint test dataset of kitchen scenes depicting both a microwave and a clock (thus controlling for differences in low-level image contrast). We found that the IRL model predicted behavioral search efficiency and fixation-density maps using multiple metrics. Moreover, reward maps from the IRL model revealed target-specific patterns that suggest, not just attention guidance by target features, but also guidance by scene context (e.g., fixations along walls in the search of clocks). Using machine learning and the psychologically-meaningful principle of reward, it is possible to learn the visual features used in goal-directed attention control.