论文标题
查看您看到的内容:从大脑活动中自我监视的视觉刺激检索
See What You See: Self-supervised Cross-modal Retrieval of Visual Stimuli from Brain Activity
论文作者
论文摘要
最近的研究表明,使用两个阶段的监督框架来生成描述人类对脑电图视觉刺激的感知的图像,指的是脑电 - 视觉重建。但是,它们无法重现确切的视觉刺激,因为它是人类指定的图像注释,而不是其数据,而不是其数据决定了合成图像是什么。此外,合成的图像通常会遭受嘈杂的脑电图编码和对生成模型的不稳定培训,因此难以识别。取而代之的是,我们提供了一个单阶段的EEG视觉检索范式,其中两个模式的数据与他们的注释相反,从而使我们能够为EEG剪辑恢复确切的视觉刺激。我们通过优化对比度的自我监视目标,最大化脑电图编码和相关视觉刺激之间的相互信息,从而带来了两个额外的好处。第一,它使EEG编码能够处理培训期间可见的视觉类别,因为学习并非针对课堂注释。此外,不再需要模型来生成视觉刺激的每个细节,而是专注于交叉模式对齐并在实例级别检索图像,从而确保可区分的模型输出。经验研究是对最大的单个主体EEG数据集进行的,该数据集测量了图像刺激引起的大脑活动。我们证明了所提出的方法完成了实例级局限 - 视觉检索任务,现有方法无法进行。我们还研究了一系列脑电图和视觉编码器结构的含义。此外,对于大多数研究的语义级EEG视觉分类任务,尽管不使用班级注释,但该方法的表现优于最先进的监督EEG视觉重建方法,尤其是关于开放类识别的能力。
Recent studies demonstrate the use of a two-stage supervised framework to generate images that depict human perception to visual stimuli from EEG, referring to EEG-visual reconstruction. They are, however, unable to reproduce the exact visual stimulus, since it is the human-specified annotation of images, not their data, that determines what the synthesized images are. Moreover, synthesized images often suffer from noisy EEG encodings and unstable training of generative models, making them hard to recognize. Instead, we present a single-stage EEG-visual retrieval paradigm where data of two modalities are correlated, as opposed to their annotations, allowing us to recover the exact visual stimulus for an EEG clip. We maximize the mutual information between the EEG encoding and associated visual stimulus through optimization of a contrastive self-supervised objective, leading to two additional benefits. One, it enables EEG encodings to handle visual classes beyond seen ones during training, since learning is not directed at class annotations. In addition, the model is no longer required to generate every detail of the visual stimulus, but rather focuses on cross-modal alignment and retrieves images at the instance level, ensuring distinguishable model output. Empirical studies are conducted on the largest single-subject EEG dataset that measures brain activities evoked by image stimuli. We demonstrate the proposed approach completes an instance-level EEG-visual retrieval task which existing methods cannot. We also examine the implications of a range of EEG and visual encoder structures. Furthermore, for a mostly studied semantic-level EEG-visual classification task, despite not using class annotations, the proposed method outperforms state-of-the-art supervised EEG-visual reconstruction approaches, particularly on the capability of open class recognition.