S3VAE：自我监督的顺序VAE，用于表示分解和数据生成

论文标题

S3VAE：自我监督的顺序VAE，用于表示分解和数据生成

S3VAE: Self-Supervised Sequential VAE for Representation Disentanglement and Data Generation

论文作者

Zhu, Yizhe, Min, Martin Renqiang, Kadav, Asim, Graf, Hans Peter

论文摘要

我们提出了一个顺序的变性自动编码器，以学习自学下的顺序数据（例如，视频和音频）的分离表示。具体来说，我们利用来自输入数据本身或一些现成的功能模型的一些易于访问的监督信号的好处，并因此设计辅助任务，以便我们的模型利用这些信号。在信号的监督下，我们的模型可以轻松地将输入序列的表示形式分解为静态因素和动态因素（即时间不变和时间变化的部分）。跨视频和音频的全面实验验证了我们的模型对序列数据的表示和生成的有效性，并证明，我们使用自学的模型的模型可与与地面真相标记的完全监督的模型相媲美，并优先于大型透视模型。

We propose a sequential variational autoencoder to learn disentangled representations of sequential data (e.g., videos and audios) under self-supervision. Specifically, we exploit the benefits of some readily accessible supervisory signals from input data itself or some off-the-shelf functional models and accordingly design auxiliary tasks for our model to utilize these signals. With the supervision of the signals, our model can easily disentangle the representation of an input sequence into static factors and dynamic factors (i.e., time-invariant and time-varying parts). Comprehensive experiments across videos and audios verify the effectiveness of our model on representation disentanglement and generation of sequential data, and demonstrate that, our model with self-supervision performs comparable to, if not better than, the fully-supervised model with ground truth labels, and outperforms state-of-the-art unsupervised models by a large margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题