离散潜在变量模型的基于自适应扰动的梯度估计

论文标题

离散潜在变量模型的基于自适应扰动的梯度估计

Adaptive Perturbation-Based Gradient Estimation for Discrete Latent Variable Models

论文作者

Minervini, Pasquale, Franceschi, Luca, Niepert, Mathias

论文摘要

深度学习体系结构中离散算法组件的集成具有许多应用。最近，通过通过与路径的梯度估计器组合隐式分化，提出了一类用于离散指数式家庭分布的梯度估计器的隐式最大似然估计（Imle，Niepert，Minervini和Franceschi 2021）。但是，由于梯度的有限差近似，它对选择有限差步长的选择特别敏感，该差距尺寸需要由用户指定。在这项工作中，我们提出了自适应IMLE（AIMLE），这是第一个用于复杂离散分布的自适应梯度估计器：它通过在梯度估计中以偏差的程度进行梯度信息的密度来自适应地识别IMLE的目标分布。我们从经验上评估了关于合成示例的估计量，以及学习解释，离散的变异自动编码器和神经关系推理任务。在我们的实验中，我们表明我们的自适应梯度估计器可以产生忠实的估计值，同时需要的样本较少，而样品比其他梯度估计器少。

The integration of discrete algorithmic components in deep learning architectures has numerous applications. Recently, Implicit Maximum Likelihood Estimation (IMLE, Niepert, Minervini, and Franceschi 2021), a class of gradient estimators for discrete exponential family distributions, was proposed by combining implicit differentiation through perturbation with the path-wise gradient estimator. However, due to the finite difference approximation of the gradients, it is especially sensitive to the choice of the finite difference step size, which needs to be specified by the user. In this work, we present Adaptive IMLE (AIMLE), the first adaptive gradient estimator for complex discrete distributions: it adaptively identifies the target distribution for IMLE by trading off the density of gradient information with the degree of bias in the gradient estimates. We empirically evaluate our estimator on synthetic examples, as well as on Learning to Explain, Discrete Variational Auto-Encoders, and Neural Relational Inference tasks. In our experiments, we show that our adaptive gradient estimator can produce faithful estimates while requiring orders of magnitude fewer samples than other gradient estimators.

下载PDF全文

下载文献需遵守相关版权规定

论文标题