思维读者：重建大脑活动中的复杂图像

论文标题

思维读者：重建大脑活动中的复杂图像

Mind Reader: Reconstructing complex images from brain activities

论文作者

Lin, Sikun, Sprague, Thomas, Singh, Ambuj K

论文摘要

了解大脑如何编码外部刺激以及如何从测量的大脑活动中解码这些刺激是神经科学中的长期且具有挑战性的问题。在本文中，我们专注于从fMRI（功能性磁共振成像）信号中重建复杂的图像刺激。与以前使用单个对象或简单形状重建图像的作品不同，我们的作品旨在重建具有语义丰富，更接近日常场景的图像刺激，并可以揭示更多的观点。但是，fMRI数据集的数据稀缺性是将最新的深度学习模型应用于此问题的主要障碍。我们发现，与将大脑信号直接转换为图像相比，合并其他文本模式对重建问题有益。因此，我们方法所涉及的方式是：（i）体素级fMRI信号，（ii）触发脑信号的图像，以及（iii）图像的文本描述。为了进一步解决数据稀缺性，我们利用了在大规模数据集上预先训练的一条斜视潜在空间。我们没有从头开始训练模型以找到三种模式共享的潜在空间，而是将fMRI信号编码到此预先对准的潜在空间中。然后，在该空间中的嵌入条件下，我们使用生成模型重建图像。我们管道中重建的图像平衡了自然和忠诚：它们是照片真实的，并很好地捕获了地面真相图像内容。

Understanding how the brain encodes external stimuli and how these stimuli can be decoded from the measured brain activities are long-standing and challenging questions in neuroscience. In this paper, we focus on reconstructing the complex image stimuli from fMRI (functional magnetic resonance imaging) signals. Unlike previous works that reconstruct images with single objects or simple shapes, our work aims to reconstruct image stimuli that are rich in semantics, closer to everyday scenes, and can reveal more perspectives. However, data scarcity of fMRI datasets is the main obstacle to applying state-of-the-art deep learning models to this problem. We find that incorporating an additional text modality is beneficial for the reconstruction problem compared to directly translating brain signals to images. Therefore, the modalities involved in our method are: (i) voxel-level fMRI signals, (ii) observed images that trigger the brain signals, and (iii) textual description of the images. To further address data scarcity, we leverage an aligned vision-language latent space pre-trained on massive datasets. Instead of training models from scratch to find a latent space shared by the three modalities, we encode fMRI signals into this pre-aligned latent space. Then, conditioned on embeddings in this space, we reconstruct images with a generative model. The reconstructed images from our pipeline balance both naturalness and fidelity: they are photo-realistic and capture the ground truth image contents well.

下载PDF全文

下载文献需遵守相关版权规定

论文标题