与时间自动编码器端到端唇部同步

论文标题

与时间自动编码器端到端唇部同步

End to End Lip Synchronization with a Temporal AutoEncoder

论文作者

Shalev, Yoav, Wolf, Lior

论文摘要

我们研究了视频中唇部运动与音频流同步的问题。我们的解决方案使用双域复发性神经网络找到了最佳的对齐，该神经网络经过培训，该网络对我们通过删除和复制视频框架生成的合成数据进行培训。一旦找到对齐，我们将修改视频以同步两个源。我们的方法显示出对各种现有基准和新基准的文献方法的表现极大。作为一个应用程序，我们证明了我们能够与现有视频流相结合的文本到语音生成的音频的能力。我们的代码和示例可在https://github.com/itsyoavshalev/end-to-den-d-lip-synchronization-with-a-a-temporal-autoencoder中找到。

We study the problem of syncing the lip movement in a video with the audio stream. Our solution finds an optimal alignment using a dual-domain recurrent neural network that is trained on synthetic data we generate by dropping and duplicating video frames. Once the alignment is found, we modify the video in order to sync the two sources. Our method is shown to greatly outperform the literature methods on a variety of existing and new benchmarks. As an application, we demonstrate our ability to robustly align text-to-speech generated audio with an existing video stream. Our code and samples are available at https://github.com/itsyoavshalev/End-to-End-Lip-Synchronization-with-a-Temporal-AutoEncoder.

下载PDF全文

下载文献需遵守相关版权规定

论文标题