论文标题
在基于能量的模型中减轻分布数据密度高估
Mitigating Out-of-Distribution Data Density Overestimation in Energy-Based Models
论文作者
论文摘要
基于深度能量的模型(EBM)将深层神经网络(DNN)用作能量函数,由于学习复杂分布的能力而引起了人们的关注。为了训练深层EBM,经常使用具有短期Langevin Monte Carlo(LMC)的最大似然估计(MLE)。与具有全马尔可夫链蒙特卡洛(MCMC)的MLE相比,具有短期LMC的MLE在计算上是有效的,但它通常为分布(OOD)数据分配高密度。为了解决这个问题,我们在这里系统地研究了为什么使用短期LMC的MLE可以以错误的密度估计收敛到EBM,并揭示了先前工作引入的对LMC的启发式修改是主要问题。然后,我们提出了一个统一的支持分区(USP)方案,该方案优化了一组均匀分区EBM的支持,然后使用结果点近似EBM-MLE损耗梯度。我们从经验上证明,USP避免了短期LMC的陷阱,从而显着改善了Fashion-Mnist上的OOD数据检测性能。
Deep energy-based models (EBMs), which use deep neural networks (DNNs) as energy functions, are receiving increasing attention due to their ability to learn complex distributions. To train deep EBMs, the maximum likelihood estimation (MLE) with short-run Langevin Monte Carlo (LMC) is often used. While the MLE with short-run LMC is computationally efficient compared to an MLE with full Markov Chain Monte Carlo (MCMC), it often assigns high density to out-of-distribution (OOD) data. To address this issue, here we systematically investigate why the MLE with short-run LMC can converge to EBMs with wrong density estimates, and reveal that the heuristic modifications to LMC introduced by previous works were the main problem. We then propose a Uniform Support Partitioning (USP) scheme that optimizes a set of points to evenly partition the support of the EBM and then uses the resulting points to approximate the EBM-MLE loss gradient. We empirically demonstrate that USP avoids the pitfalls of short-run LMC, leading to significantly improved OOD data detection performance on Fashion-MNIST.