论文标题
改善了Stein变分梯度下降,重量重量
Improved Stein Variational Gradient Descent with Importance Weights
论文作者
论文摘要
Stein变分梯度下降(SVGD)是一种流行的抽样算法,用于各种机器学习任务。众所周知,SVGD源于Kullback-Leibler Divergence $ d_ {kl} \ left(\ cdot \ cdot \midπ\ right)$的离散化梯度流的离散化,其中$π$是目标分布。在这项工作中,我们建议通过引入重要性权重来增强SVGD,这导致了一种新方法,我们将其称为$β$ -SVGD。在连续的时间和无限的粒子状态下,该流程汇集到由Stein Fisher信息量化的平衡分布$π$的时间取决于$ρ_0$和$π$非常弱。这与Kullback-Leibler Divergence的内核化梯度流完全不同,后者的时间复杂性取决于$ d_ {kl} \ left(ρ_0\midπ\ right)$。在某些假设下,我们为人口限制$β$ -SVGD提供了下降引理,当$β\至0 $ $β\时,该人口限制了svgd的下降引理。我们还通过实验说明了$β$ -SVGD的优点。
Stein Variational Gradient Descent (SVGD) is a popular sampling algorithm used in various machine learning tasks. It is well known that SVGD arises from a discretization of the kernelized gradient flow of the Kullback-Leibler divergence $D_{KL}\left(\cdot\midπ\right)$, where $π$ is the target distribution. In this work, we propose to enhance SVGD via the introduction of importance weights, which leads to a new method for which we coin the name $β$-SVGD. In the continuous time and infinite particles regime, the time for this flow to converge to the equilibrium distribution $π$, quantified by the Stein Fisher information, depends on $ρ_0$ and $π$ very weakly. This is very different from the kernelized gradient flow of Kullback-Leibler divergence, whose time complexity depends on $D_{KL}\left(ρ_0\midπ\right)$. Under certain assumptions, we provide a descent lemma for the population limit $β$-SVGD, which covers the descent lemma for the population limit SVGD when $β\to 0$. We also illustrate the advantages of $β$-SVGD over SVGD by experiments.