论文标题
不对称学习的图像压缩,具有多尺度残差块,重要性图和定量后过滤
Asymmetric Learned Image Compression with Multi-Scale Residual Block, Importance Map, and Post-Quantization Filtering
论文作者
论文摘要
最近,基于深度学习的图像压缩已取得了显着的进步,并且在主观度量和更具挑战性的客观指标中,与最新的传统方法H.266/vvc相比,取得了更好的评级(R-D)性能。但是,一个主要的问题是,许多领先的学识渊博的方案无法保持绩效和复杂性之间的良好权衡。在本文中,我们提出了一个效率和有效的图像编码框架,该框架的复杂性比最高的状态具有相似的R-D性能。首先,我们开发了一个改进的多尺度残差块(MSRB),该块可以扩展容纳长石,并且更容易获得全球信息。它可以进一步捕获和减少潜在表示的空间相关性。其次,引入了更高级的重要性图网络,以自适应地分配位置到图像的不同区域。第三,我们应用了2D定量后flter(PQF)来减少视频编码中样本自适应偏移(SAO)FLETER的量化错误。此外,我们认为编码器和解码器的复杂性对图像压缩性能有不同的影响。基于此观察结果,我们设计了一个不对称的范式,其中编码器采用三个阶段的MSRB来提高学习能力,而解码器只需要一个srb的一个阶段就可以产生令人满意的重建,从而在不牺牲性能的情况下降低了解码复杂性。实验结果表明,与最先进的方法相比,所提出方法的编码和解码时间的速度约为17倍,而R-D性能仅在柯达和TECNICK数据集中降低了1%的速度,而H.266/VVC(4:4:4:4:4:4)和其他基于学习的方法仍然更好。我们的源代码可在https://github.com/fengyurenpingsheng上公开获得。
Recently, deep learning-based image compression has made signifcant progresses, and has achieved better ratedistortion (R-D) performance than the latest traditional method, H.266/VVC, in both subjective metric and the more challenging objective metric. However, a major problem is that many leading learned schemes cannot maintain a good trade-off between performance and complexity. In this paper, we propose an effcient and effective image coding framework, which achieves similar R-D performance with lower complexity than the state of the art. First, we develop an improved multi-scale residual block (MSRB) that can expand the receptive feld and is easier to obtain global information. It can further capture and reduce the spatial correlation of the latent representations. Second, a more advanced importance map network is introduced to adaptively allocate bits to different regions of the image. Third, we apply a 2D post-quantization flter (PQF) to reduce the quantization error, motivated by the Sample Adaptive Offset (SAO) flter in video coding. Moreover, We fnd that the complexity of encoder and decoder have different effects on image compression performance. Based on this observation, we design an asymmetric paradigm, in which the encoder employs three stages of MSRBs to improve the learning capacity, whereas the decoder only needs one stage of MSRB to yield satisfactory reconstruction, thereby reducing the decoding complexity without sacrifcing performance. Experimental results show that compared to the state-of-the-art method, the encoding and decoding time of the proposed method are about 17 times faster, and the R-D performance is only reduced by less than 1% on both Kodak and Tecnick datasets, which is still better than H.266/VVC(4:4:4) and other recent learning-based methods. Our source code is publicly available at https://github.com/fengyurenpingsheng.