HVS重新审视：一个全面的视频质量评估框架

论文标题

HVS重新审视：一个全面的视频质量评估框架

HVS Revisited: A Comprehensive Video Quality Assessment Framework

论文作者

Zhang, Ao-Xiang, Wang, Yuan-Gen, Tang, Weixuan, Li, Leida, Kwong, Sam

论文摘要

视频质量是视频服务提供商的主要问题。近年来，基于深度卷积神经网络（CNN）的视频质量评估技术（VQA）的技术已经迅速开发。尽管现有作品试图将人类视觉系统（HVS）的知识引入VQA，但仍存在局限性，可以防止对HV的全面开发，包括这些特征很少的特征和连接不足的模型不完整。为了克服这些局限性，本文以五个代表性特征重新审视了HVS，并进一步重组了它们的联系。基于重新访问的HVS，提出了一个称为HVS-5M的No-Reference VQA框架（NRVQA框架具有五个模拟具有五个特征的HVS的模块）。它在具有高级网络结构的域融合设计范式中起作用。在空间结构域的侧面，视觉显着性模块应用SAMNET获得显着图。然后，分别利用Convnext来提取空间特征，这些空间特征是由显着性图的加权来提取的，目的是突出人们可能感兴趣的那些区域。此外，时间滞后模块应用Temphyst来模拟人类的记忆机制，并根据空间和时间域的融合特征全面评估质量评分。广泛的实验表明，我们的HVS-5M胜过最先进的VQA方法。进一步进行消融研究，以验证每个模块对拟议框架的有效性。

Video quality is a primary concern for video service providers. In recent years, the techniques of video quality assessment (VQA) based on deep convolutional neural networks (CNNs) have been developed rapidly. Although existing works attempt to introduce the knowledge of the human visual system (HVS) into VQA, there still exhibit limitations that prevent the full exploitation of HVS, including an incomplete model by few characteristics and insufficient connections among these characteristics. To overcome these limitations, this paper revisits HVS with five representative characteristics, and further reorganizes their connections. Based on the revisited HVS, a no-reference VQA framework called HVS-5M (NRVQA framework with five modules simulating HVS with five characteristics) is proposed. It works in a domain-fusion design paradigm with advanced network structures. On the side of the spatial domain, the visual saliency module applies SAMNet to obtain a saliency map. And then, the content-dependency and the edge masking modules respectively utilize ConvNeXt to extract the spatial features, which have been attentively weighted by the saliency map for the purpose of highlighting those regions that human beings may be interested in. On the other side of the temporal domain, to supplement the static spatial features, the motion perception module utilizes SlowFast to obtain the dynamic temporal features. Besides, the temporal hysteresis module applies TempHyst to simulate the memory mechanism of human beings, and comprehensively evaluates the quality score according to the fusion features from the spatial and temporal domains. Extensive experiments show that our HVS-5M outperforms the state-of-the-art VQA methods. Ablation studies are further conducted to verify the effectiveness of each module towards the proposed framework.

下载PDF全文

下载文献需遵守相关版权规定

论文标题