带有噪声生成器的端到端sindhorn自动编码器

论文标题

带有噪声生成器的端到端sindhorn自动编码器

End-to-end Sinkhorn Autoencoder with Noise Generator

论文作者

Deja, Kamil, Dubiński, Jan, Nowak, Piotr, Wenzel, Sandro, Trzciński, Tomasz

论文摘要

在这项工作中，我们提出了一种新颖的端到端sndhorn自动编码器，并使用噪声生成器来进行有效的数据收集模拟。模拟旨在收集实验数据的过程对于包括核医学，天文学和高能量物理的多种现实生活应用至关重要。当代方法，例如蒙特卡洛算法，以高计算成本的价格提供高保真结果。多次尝试减轻这种负担，例如使用基于生成对抗网络或变分自动编码器的生成方法。尽管此类方法速度更快，但它们通常在培训中不稳定，并且不允许从整个数据分布中进行采样。为了解决这些缺点，我们介绍了一种称为端到端的sindhorn自动编码器的新颖方法，该方法利用sindhorn算法明确地对准编码的真实数据示例和生成的噪声。更准确地说，我们通过添加一个确定性神经网络来扩展自动编码器体系结构，该网络训练有素，该网络从已知的分布中映射噪声到代表数据分布的自动编码器潜在空间。我们共同优化整个模型。我们的方法在LHC中爱丽丝实验的零度量热计的模拟数据数据集上优于竞争方法。以及标准的基准测试，例如Mnist和Celeba。

In this work, we propose a novel end-to-end sinkhorn autoencoder with noise generator for efficient data collection simulation. Simulating processes that aim at collecting experimental data is crucial for multiple real-life applications, including nuclear medicine, astronomy and high energy physics. Contemporary methods, such as Monte Carlo algorithms, provide high-fidelity results at a price of high computational cost. Multiple attempts are taken to reduce this burden, e.g. using generative approaches based on Generative Adversarial Networks or Variational Autoencoders. Although such methods are much faster, they are often unstable in training and do not allow sampling from an entire data distribution. To address these shortcomings, we introduce a novel method dubbed end-to-end Sinkhorn Autoencoder, that leverages sinkhorn algorithm to explicitly align distribution of encoded real data examples and generated noise. More precisely, we extend autoencoder architecture by adding a deterministic neural network trained to map noise from a known distribution onto autoencoder latent space representing data distribution. We optimise the entire model jointly. Our method outperforms competing approaches on a challenging dataset of simulation data from Zero Degree Calorimeters of ALICE experiment in LHC. as well as standard benchmarks, such as MNIST and CelebA.

下载PDF全文

下载文献需遵守相关版权规定

论文标题