双重重新聚集的重要性加权结构学习场景图生成

论文标题

双重重新聚集的重要性加权结构学习场景图生成

Doubly Reparameterized Importance Weighted Structure Learning for Scene Graph Generation

论文作者

Liu, Daqi, Bober, Miroslaw, Kittler, Josef

论文摘要

作为一个结构化的预测任务，场景图生成给定输入图像，旨在通过构造视觉上的场景图来明确建模对象及其关系。在当前的文献中，这种任务是通过传递基于神经网络的均值变异贝叶斯方法的消息普遍解决的。经典的宽松证据下结合通常被选择为变异推理目标，这可能会诱导过度简化的变分近似，从而低估了下面的复合物后部。在本文中，我们提出了一种新颖的双重重视重要性加权结构学习方法，该方法采用更严重的加权下限作为变异推理目标。它是从从可重新聚集的gumbel-softmax采样器中绘制的多个样本中计算得出的，由此产生的约束变异推理任务通过通用的熵镜下降算法求解。由此产生的双重重聚梯度估计器可降低相应的衍生物的方差，并对学习产生有益的影响。所提出的方法在各种流行的场景图生成基准中实现了最新的性能。

As a structured prediction task, scene graph generation, given an input image, aims to explicitly model objects and their relationships by constructing a visually-grounded scene graph. In the current literature, such task is universally solved via a message passing neural network based mean field variational Bayesian methodology. The classical loose evidence lower bound is generally chosen as the variational inference objective, which could induce oversimplified variational approximation and thus underestimate the underlying complex posterior. In this paper, we propose a novel doubly reparameterized importance weighted structure learning method, which employs a tighter importance weighted lower bound as the variational inference objective. It is computed from multiple samples drawn from a reparameterizable Gumbel-Softmax sampler and the resulting constrained variational inference task is solved by a generic entropic mirror descent algorithm. The resulting doubly reparameterized gradient estimator reduces the variance of the corresponding derivatives with a beneficial impact on learning. The proposed method achieves the state-of-the-art performance on various popular scene graph generation benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题