SATBENCH：基准在人类和动态神经网络中对象识别的速度准确性权衡

论文标题

SATBENCH：基准在人类和动态神经网络中对象识别的速度准确性权衡

SATBench: Benchmarking the speed-accuracy tradeoff in object recognition by humans and dynamic neural networks

论文作者

Subramanian, Ajay, Price, Sara, Kumbhar, Omkar, Sizikova, Elena, Majaj, Najib J., Pelli, Denis G.

论文摘要

阅读和驾驶等日常任务的核心是主动对象识别。目前无法合并时间来阻碍建模此类任务的尝试。人们在速度和准确性之间表现出灵活的权衡，而这种权衡是至关重要的人类技能。深层神经网络已成为预测人类对象识别峰值和神经活动的有前途的候选人。但是，建模时间维度，即速度准确折衷方案（SAT），对于它们作为人类如何识别对象的有用计算模型至关重要。为此，我们在这里介绍了第一个大规模（148个观察者，4个神经网络，8个任务）数据集，该数据集是识别Imagenet图像时速度准确折衷方案（SAT）。在每个人类试验中，哔哔声表示所需的反应时间，在呈现图像后以固定的延迟发出声音，并且观察者的响应仅在哔哔声附近发生时才计算。在一系列块中，我们测试了许多蜂鸣延迟，即反应时间。我们观察到人类的准确性随反应时间的增加，并继续将其特征与能够推断时间自适应计算的几个动态神经网络的行为进行比较。我们将Flops作为反应时间的类似物，我们将网络与人类在曲线拟合误差，类别相关性和曲线陡度中进行比较，并得出结论，级联的动态神经网络是对象识别任务中人类反应时间的有前途的模型。

The core of everyday tasks like reading and driving is active object recognition. Attempts to model such tasks are currently stymied by the inability to incorporate time. People show a flexible tradeoff between speed and accuracy and this tradeoff is a crucial human skill. Deep neural networks have emerged as promising candidates for predicting peak human object recognition performance and neural activity. However, modeling the temporal dimension i.e., the speed-accuracy tradeoff (SAT), is essential for them to serve as useful computational models for how humans recognize objects. To this end, we here present the first large-scale (148 observers, 4 neural networks, 8 tasks) dataset of the speed-accuracy tradeoff (SAT) in recognizing ImageNet images. In each human trial, a beep, indicating the desired reaction time, sounds at a fixed delay after the image is presented, and observer's response counts only if it occurs near the time of the beep. In a series of blocks, we test many beep latencies, i.e., reaction times. We observe that human accuracy increases with reaction time and proceed to compare its characteristics with the behavior of several dynamic neural networks that are capable of inference-time adaptive computation. Using FLOPs as an analog for reaction time, we compare networks with humans on curve-fit error, category-wise correlation, and curve steepness, and conclude that cascaded dynamic neural networks are a promising model of human reaction time in object recognition tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题