在哪里看以及如何描述：通过注意力异质双线网络检索时尚图像检索

论文标题

在哪里看以及如何描述：通过注意力异质双线网络检索时尚图像检索

Where to Look and How to Describe: Fashion Image Retrieval with an Attentional Heterogeneous Bilinear Network

论文作者

Su, Haibo, Wang, Peng, Liu, Lingqiao, Li, Hui, Li, Zhen, Zhang, Yanning

论文摘要

时尚产品通常具有不同服装零件各种样式的组成。为了区分不同时尚产品的图像，我们需要提取外观（即“如何描述”）和本地化（即“在哪里看”）信息及其相互作用。为此，我们为基于图像的时尚产品检索提出了一个以生物学启发的框架，该框架模仿了假设的人类大脑的两次视觉处理系统。拟议的注意力异构双线性网络（AHBN）由两个分支组成：一个深CNN分支，用于提取细粒度的外观属性和一个完全卷积的分支，以提取具有里程碑意义的定位信息。联合通道的注意机制进一步应用于提取的异质特征，以专注于重要的通道，然后是紧凑的双线性池层层，以建模两条流的相互作用。我们提出的框架在三个基于图像的时尚产品检索基准测试中取得了令人满意的性能。

Fashion products typically feature in compositions of a variety of styles at different clothing parts. In order to distinguish images of different fashion products, we need to extract both appearance (i.e., "how to describe") and localization (i.e.,"where to look") information, and their interactions. To this end, we propose a biologically inspired framework for image-based fashion product retrieval, which mimics the hypothesized twostream visual processing system of human brain. The proposed attentional heterogeneous bilinear network (AHBN) consists of two branches: a deep CNN branch to extract fine-grained appearance attributes and a fully convolutional branch to extract landmark localization information. A joint channel-wise attention mechanism is further applied to the extracted heterogeneous features to focus on important channels, followed by a compact bilinear pooling layer to model the interaction of the two streams. Our proposed framework achieves satisfactory performance on three image-based fashion product retrieval benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题