学习图像压缩的因果上下文预测

论文标题

学习图像压缩的因果上下文预测

Causal Contextual Prediction for Learned Image Compression

论文作者

Guo, Zongyu, Zhang, Zhizheng, Feng, Runsen, Chen, Zhibo

论文摘要

在过去的几年中，我们目睹了学到的图像压缩领域令人印象深刻的进步。最近学习的图像编解码器通常是基于自动编码器，该自动编码器首先将图像编码为低维的潜在表示，然后将其用于重建目的。为了捕获潜在空间中的空间依赖关系，先前的作品利用了高位和空间上下文模型来构建熵模型，该模型估算了端到端利率延伸优化的位速率。但是，这种熵模型是从两个方面次优的：（1）它无法捕获潜在的潜在全局相关性。（2）潜伏期的跨渠道关系仍未得到充实。在本文中，我们提出了单独的熵编码的概念，以利用潜在空间中因果上下文熵预测的串行解码过程。提出了一个因果上下文模型，该模型将跨渠道的潜伏期分开，并利用跨渠道关系来产生高度信息的环境。此外，我们提出了一个因果全局预测模型，该模型能够找到全局参考点，以准确预测未知点。这两种模型都促进了熵估计，而无需开销。此外，我们进一步采用了一个新的独立注意模块来构建更强大的变换网络。实验结果表明，我们的完整图像压缩模型就PSNR和MS-SSIM而言，在柯达数据集上的标准VVC/H.266编解码器优于标准VVC/H.266编解码器，从而产生了最先进的速率 - 分数性能。

Over the past several years, we have witnessed impressive progress in the field of learned image compression. Recent learned image codecs are commonly based on autoencoders, that first encode an image into low-dimensional latent representations and then decode them for reconstruction purposes. To capture spatial dependencies in the latent space, prior works exploit hyperprior and spatial context model to build an entropy model, which estimates the bit-rate for end-to-end rate-distortion optimization. However, such an entropy model is suboptimal from two aspects: (1) It fails to capture spatially global correlations among the latents. (2) Cross-channel relationships of the latents are still underexplored. In this paper, we propose the concept of separate entropy coding to leverage a serial decoding process for causal contextual entropy prediction in the latent space. A causal context model is proposed that separates the latents across channels and makes use of cross-channel relationships to generate highly informative contexts. Furthermore, we propose a causal global prediction model, which is able to find global reference points for accurate predictions of unknown points. Both these two models facilitate entropy estimation without the transmission of overhead. In addition, we further adopt a new separate attention module to build more powerful transform networks. Experimental results demonstrate that our full image compression model outperforms standard VVC/H.266 codec on Kodak dataset in terms of both PSNR and MS-SSIM, yielding the state-of-the-art rate-distortion performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题