通过注意模型和数据平衡的有效自动图像注释模型

论文标题

通过注意模型和数据平衡的有效自动图像注释模型

An Effective Automatic Image Annotation Model Via Attention Model and Data Equilibrium

论文作者

Vatani, Amir, Ahvanooey, Milad Taleby, Rahimi, Mostafa

论文摘要

如今，可以提供大量图像。但是，在计算机视觉系统中，为普通用户检索所需的图像是一项具有挑战性的任务。在过去的二十年中，已经引入了许多类型的研究，以提高图像自动注释的性能，图像的自动注释传统上专注于基于内容的图像检索。虽然，最近的研究表明，基于内容的图像检索与图像语义之间存在语义差距。结果，该领域的现有研究导致弥合低级图像特征和高级语义之间的语义差距。弥合语义差距的常规方法是通过使用机器学习技术提取语义特征的自动图像注释（AIA）。在本文中，我们提出了一种基于深度学习特征提取方法的新型AIA模型。所提出的模型具有三个阶段，包括特征提取器，标签发生器和图像注释器。首先，提出的模型会自动提取基于双树的高和低水平特征，继续小波变换（DT-CWT），奇异值分解，颜色吨的分布和深神经网络。此外，标签生成器通过新的日志entropy自动编码器（LEAE）来平衡带注释的关键字的字典，然后通过Word嵌入来描述这些关键字。最后，注释者基于长期内存（LSTM）网络的工作原理，以获得图像的特定特征的重要性程度。在两个基准数据集上进行的实验证实，就性能标准而言，所提出的模型的优越性与先前的模型相比。

Nowadays, a huge number of images are available. However, retrieving a required image for an ordinary user is a challenging task in computer vision systems. During the past two decades, many types of research have been introduced to improve the performance of the automatic annotation of images, which are traditionally focused on content-based image retrieval. Although, recent research demonstrates that there is a semantic gap between content-based image retrieval and image semantics understandable by humans. As a result, existing research in this area has caused to bridge the semantic gap between low-level image features and high-level semantics. The conventional method of bridging the semantic gap is through the automatic image annotation (AIA) that extracts semantic features using machine learning techniques. In this paper, we propose a novel AIA model based on the deep learning feature extraction method. The proposed model has three phases, including a feature extractor, a tag generator, and an image annotator. First, the proposed model extracts automatically the high and low-level features based on dual-tree continues wavelet transform (DT-CWT), singular value decomposition, distribution of color ton, and the deep neural network. Moreover, the tag generator balances the dictionary of the annotated keywords by a new log-entropy auto-encoder (LEAE) and then describes these keywords by word embedding. Finally, the annotator works based on the long-short-term memory (LSTM) network in order to obtain the importance degree of specific features of the image. The experiments conducted on two benchmark datasets confirm that the superiority of the proposed model compared to the previous models in terms of performance criteria.

下载PDF全文

下载文献需遵守相关版权规定

论文标题