您的编码器有多深：对基于自动编码器的视听质量指标的描述符的分析

论文标题

您的编码器有多深：对基于自动编码器的视听质量指标的描述符的分析

How deep is your encoder: an analysis of features descriptors for an autoencoder-based audio-visual quality metric

论文作者

Martinez, Helard, Hines, Andrew, Farias, Mylene C. Q.

论文摘要

视听质量评估模型的发展提出了许多挑战，以获得准确的预测。这些挑战之一是音频和视觉刺激具有的复杂相互作用以及人类用户如何解释这种相互作用的建模。基于Deep AutoCoder（Navidad）的无参考音频质量指标从机器学习的角度处理了此问题。该指标将接收两组音频和视频功能描述符，并产生用于预测视听质量的低维功能。 NAVIDAD的基本实现能够通过一系列不同的视听数据库进行准确的预测。当前的工作对度量标准的基础体系结构进行了消融研究。使用不同的配置将几个模块删除或重新训练，以更好地了解度量功能。这项研究中提出的结果提供了重要的反馈，使我们能够了解度量标准的架构的实际能力，并最终开发出更好的视听质量度量。

The development of audio-visual quality assessment models poses a number of challenges in order to obtain accurate predictions. One of these challenges is the modelling of the complex interaction that audio and visual stimuli have and how this interaction is interpreted by human users. The No-Reference Audio-Visual Quality Metric Based on a Deep Autoencoder (NAViDAd) deals with this problem from a machine learning perspective. The metric receives two sets of audio and video features descriptors and produces a low-dimensional set of features used to predict the audio-visual quality. A basic implementation of NAViDAd was able to produce accurate predictions tested with a range of different audio-visual databases. The current work performs an ablation study on the base architecture of the metric. Several modules are removed or re-trained using different configurations to have a better understanding of the metric functionality. The results presented in this study provided important feedback that allows us to understand the real capacity of the metric's architecture and eventually develop a much better audio-visual quality metric.

下载PDF全文

下载文献需遵守相关版权规定

论文标题