JPEGS的CNN：计算成本的研究

论文标题

JPEGS的CNN：计算成本的研究

CNNs for JPEGs: A Study in Computational Cost

论文作者

Santos, Samuel Felipe dos, Sebe, Nicu, Almeida, Jurandy

论文摘要

在过去的十年中，卷积神经网络（CNN）取得了惊人的进步，在几项计算机视觉任务中定义了最新的进步。 CNN能够直接从RGB像素学习数据的强大表示。但是，大多数图像数据通常以压缩格式获得，由于传输和存储目的，JPEG是最广泛使用的，要求初步解码过程具有较高的计算负载和内存使用情况。因此，近年来能够直接从压缩领域学习的深度学习方法一直在引起人们的关注。这些方法通常通过部分解码来提取图像的频域表示，例如DCT，然后对典型的CNNS体系结构进行适应以与它们合作。这些当前作品的一个局限性是，为了适应频域数据，对原始模型进行的修改大大增加了其参数和计算复杂性的数量。一方面，这些方法具有更快的预处理，因为避免了完全解码图像的成本，但另一方面，尽管模型增加了图像的成本，但通过增加了图像的成本，从而减轻了加速方法的可能性。在本文中，我们提出了针对频域设计的深层模型的计算成本的进一步研究，评估了解码和通过网络传递图像的成本。我们还提出了手工制作和数据驱动的技术，以降低这些模型的计算复杂性和参数数量，以使它们与RGB基线相似，从而在计算成本和准确性之间取得更好的权衡，从而使它们相似。

Convolutional neural networks (CNNs) have achieved astonishing advances over the past decade, defining state-of-the-art in several computer vision tasks. CNNs are capable of learning robust representations of the data directly from the RGB pixels. However, most image data are usually available in compressed format, from which the JPEG is the most widely used due to transmission and storage purposes demanding a preliminary decoding process that have a high computational load and memory usage. For this reason, deep learning methods capable of learning directly from the compressed domain have been gaining attention in recent years. Those methods usually extract a frequency domain representation of the image, like DCT, by a partial decoding, and then make adaptation to typical CNNs architectures to work with them. One limitation of these current works is that, in order to accommodate the frequency domain data, the modifications made to the original model increase significantly their amount of parameters and computational complexity. On one hand, the methods have faster preprocessing, since the cost of fully decoding the images is avoided, but on the other hand, the cost of passing the images though the model is increased, mitigating the possible upside of accelerating the method. In this paper, we propose a further study of the computational cost of deep models designed for the frequency domain, evaluating the cost of decoding and passing the images through the network. We also propose handcrafted and data-driven techniques for reducing the computational complexity and the number of parameters for these models in order to keep them similar to their RGB baselines, leading to efficient models with a better trade off between computational cost and accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题