论文标题

动态变异自动编码器:全面评论

Dynamical Variational Autoencoders: A Comprehensive Review

论文作者

Girin, Laurent, Leglaive, Simon, Bie, Xiaoyu, Diard, Julien, Hueber, Thomas, Alameda-Pineda, Xavier

论文摘要

变分自动编码器(VAE)是通过以无监督方式学到的低维潜在空间来代表高维复杂数据的强大深层生成模型。在原始的VAE模型中,输入数据向量是独立处理的。最近,一系列论文提出了VAE的不同扩展,以处理顺序数据,这些数据不仅对潜在空间进行了建模,而且对数据向量序列和相应的潜在向量的时间依赖性进行了建模,并依赖于经常性的神经网络或状态空间模型。在本文中,我们对这些模型进行了文献综述。我们介绍和讨论一类模型,称为动力学变异自动编码器(DVAE),该模型包括这些时间VAE扩展的大部分子集。然后,我们详细介绍了七个最近提出的DVAE模型,目的是将符号和演示线的均质化,并将这些模型与现有的经典时间模型联系起来。我们已经重新实现了这七个DVAE模型,并介绍了对语音分析 - 共同任务进行的实验基准的结果(Pytorch代码已公开可用)。本文以讨论有关模型类别和未来研究指南的重要问题的讨论结束。

Variational autoencoders (VAEs) are powerful deep generative models widely used to represent high-dimensional complex data through a low-dimensional latent space learned in an unsupervised manner. In the original VAE model, the input data vectors are processed independently. Recently, a series of papers have presented different extensions of the VAE to process sequential data, which model not only the latent space but also the temporal dependencies within a sequence of data vectors and corresponding latent vectors, relying on recurrent neural networks or state-space models. In this paper, we perform a literature review of these models. We introduce and discuss a general class of models, called dynamical variational autoencoders (DVAEs), which encompasses a large subset of these temporal VAE extensions. Then, we present in detail seven recently proposed DVAE models, with an aim to homogenize the notations and presentation lines, as well as to relate these models with existing classical temporal models. We have reimplemented those seven DVAE models and present the results of an experimental benchmark conducted on the speech analysis-resynthesis task (the PyTorch code is made publicly available). The paper concludes with a discussion on important issues concerning the DVAE class of models and future research guidelines.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源