分解，压缩和综合（DCS）基于基于分辨率 - 自适应学习的神经探索

论文标题

分解，压缩和综合（DCS）基于基于分辨率 - 自适应学习的神经探索

Decomposition, Compression, and Synthesis (DCS)-based Video Coding: A Neural Exploration via Resolution-Adaptive Learning

论文作者

Lu, Ming, Chen, Tong, Ding, Dandan, Zhu, Fengqing, Ma, Zhan

论文摘要

Inspired by the facts that retinal cells actually segregate the visual scene into different attributes (e.g., spatial details, temporal motion) for respective neuronal processing, we propose to first decompose the input video into respective spatial texture frames (STF) at its native spatial resolution that preserve the rich spatial details, and the other temporal motion frames (TMF) at a lower spatial resolution that retain the motion平滑度；然后使用任何流行的视频编码器一起压缩它们；最后，以与本机输入相同的分辨率合成了用于高保真视频重建的解码的STF和TMF。这项工作只是将双色重新采样应用于分解和符合HEVC的编解码器中，并将重点放在合成部分上。对于解决方案自适应综合，在TMF上设计了一个运动补偿网络（MCN），以有效地对齐和聚集的时间运动特征，该特征将使用非局部纹理传递网络（NL-TTN）与相应的STF共同处理，以更好地增强空间细节，以更好地增强空间细节，从而可以更好地分配效率效果，从而有效地效率地分配了效果。这种“分解，压缩，合成（DCS）”方案是编解码器不可知论，目前在使用参考软件的HEVC锚点相对于HEVC锚定，当前平均$ \ $ \ $ \ $ \ $ \ $ \ $ \ $ \ $ \ $ \ $ \ $ \ $ 25％。此外，进行了与最先进方法和消融研究的实验比较，以进一步报告DCS算法的效率和概括，并有望为将来的视频编码提供令人鼓舞的方向。

Inspired by the facts that retinal cells actually segregate the visual scene into different attributes (e.g., spatial details, temporal motion) for respective neuronal processing, we propose to first decompose the input video into respective spatial texture frames (STF) at its native spatial resolution that preserve the rich spatial details, and the other temporal motion frames (TMF) at a lower spatial resolution that retain the motion smoothness; then compress them together using any popular video coder; and finally synthesize decoded STFs and TMFs for high-fidelity video reconstruction at the same resolution as its native input. This work simply applies the bicubic resampling in decomposition and HEVC compliant codec in compression, and puts the focus on the synthesis part. For resolution-adaptive synthesis, a motion compensation network (MCN) is devised on TMFs to efficiently align and aggregate temporal motion features that will be jointly processed with corresponding STFs using a non-local texture transfer network (NL-TTN) to better augment spatial details, by which the compression and resolution resampling noises can be effectively alleviated with better rate-distortion efficiency. Such "Decomposition, Compression, Synthesis (DCS)" based scheme is codec agnostic, currently exemplifying averaged $\approx$1 dB PSNR gain or $\approx$25% BD-rate saving, against the HEVC anchor using reference software. In addition, experimental comparisons to the state-of-the-art methods and ablation studies are conducted to further report the efficiency and generalization of DCS algorithm, promising an encouraging direction for future video coding.

下载PDF全文

下载文献需遵守相关版权规定

论文标题