小波扩散模型是快速且可扩展的图像发生器

论文标题

小波扩散模型是快速且可扩展的图像发生器

Wavelet Diffusion Models are fast and scalable Image Generators

论文作者

Phung, Hao, Dao, Quan, Tran, Anh

论文摘要

扩散模型正在作为高保真图像产生的强大解决方案，在许多情况下，质量超过了剂量。但是，他们的缓慢训练和推理速度是一个巨大的瓶颈，阻止了它们在实时应用中使用。最近的diffusiongan方法通过将采样步骤的数量从数千种到几个来大大减少了模型的运行时间，但是它们的速度仍然在很大程度上落后于GAN对应物。本文旨在通过提出一种基于小波的扩散方案来减少速度差距。我们通过小波分解从图像和特征级别提取低和高频率组件，并自适应处理这些组件，以更快地处理，同时保持良好的发电质量。此外，我们建议使用重建术语，从而有效地增强模型训练的收敛性。 Celeba-HQ，CIFAR-10，LSUN-Church和STL-10数据集的实验结果证明，我们的解决方案是提供实时和高效率扩散模型的垫脚石。我们的代码和预训练的检查点可在\ url {https://github.com/vinairesearch/wavediff.git}上找到。

Diffusion models are rising as a powerful solution for high-fidelity image generation, which exceeds GANs in quality in many circumstances. However, their slow training and inference speed is a huge bottleneck, blocking them from being used in real-time applications. A recent DiffusionGAN method significantly decreases the models' running time by reducing the number of sampling steps from thousands to several, but their speeds still largely lag behind the GAN counterparts. This paper aims to reduce the speed gap by proposing a novel wavelet-based diffusion scheme. We extract low-and-high frequency components from both image and feature levels via wavelet decomposition and adaptively handle these components for faster processing while maintaining good generation quality. Furthermore, we propose to use a reconstruction term, which effectively boosts the model training convergence. Experimental results on CelebA-HQ, CIFAR-10, LSUN-Church, and STL-10 datasets prove our solution is a stepping-stone to offering real-time and high-fidelity diffusion models. Our code and pre-trained checkpoints are available at \url{https://github.com/VinAIResearch/WaveDiff.git}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题