论文标题
基于能量的生成显着性的先验
An Energy-Based Prior for Generative Saliency
论文作者
论文摘要
我们提出了一个新颖的生成显着性预测框架,该框架采用了提供信息的基于能量的模型作为先前的分布。基于能量的先验模型是在显着性发生器网络的潜在空间上定义的,该模型基于连续的延迟变量和观察到的图像生成显着性图。显着性发生器的参数和基于能量的先验都是通过马尔可夫链蒙特卡洛(Monte Carlo)的最大似然估计共同训练的,其中潜在变量的棘手后验和先验分布采样由Langevin Dynamics执行。使用生成显着性模型,我们可以从图像中获得像素的不确定性图,这表明模型对显着性预测的信心。与现有的生成模型不同,该模型将潜在变量的先前分布定义为简单的各向同性高斯分布,我们的模型使用基于能量的信息之前,可以在捕获数据的潜在空间方面具有更大的表现力。借助基于能源的先验,我们扩展了生成模型的高斯分布假设,以实现潜在空间的更具代表性的分布,从而导致更可靠的不确定性估计。我们将提出的框架应用于RGB和RGB-D显着对象检测任务,并具有变压器和卷积神经网络骨架。我们进一步提出了一种对抗性学习算法和一种变异推理算法,作为训练拟议生成框架的替代方法。实验结果表明,我们具有基于能量的先验的生成显着性模型不仅可以实现准确的显着性预测,而且还可以实现与人类感知一致的可靠不确定性图。结果和代码可在\ url {https://github.com/jingzhang617/ebmgsod}中获得。
We propose a novel generative saliency prediction framework that adopts an informative energy-based model as a prior distribution. The energy-based prior model is defined on the latent space of a saliency generator network that generates the saliency map based on a continuous latent variables and an observed image. Both the parameters of saliency generator and the energy-based prior are jointly trained via Markov chain Monte Carlo-based maximum likelihood estimation, in which the sampling from the intractable posterior and prior distributions of the latent variables are performed by Langevin dynamics. With the generative saliency model, we can obtain a pixel-wise uncertainty map from an image, indicating model confidence in the saliency prediction. Different from existing generative models, which define the prior distribution of the latent variables as a simple isotropic Gaussian distribution, our model uses an energy-based informative prior which can be more expressive in capturing the latent space of the data. With the informative energy-based prior, we extend the Gaussian distribution assumption of generative models to achieve a more representative distribution of the latent space, leading to more reliable uncertainty estimation. We apply the proposed frameworks to both RGB and RGB-D salient object detection tasks with both transformer and convolutional neural network backbones. We further propose an adversarial learning algorithm and a variational inference algorithm as alternatives to train the proposed generative framework. Experimental results show that our generative saliency model with an energy-based prior can achieve not only accurate saliency predictions but also reliable uncertainty maps that are consistent with human perception. Results and code are available at \url{https://github.com/JingZhang617/EBMGSOD}.