学习使用语义相似性指标和经验KL差异

论文标题

学习使用语义相似性指标和经验KL差异

Learning to hash with semantic similarity metrics and empirical KL divergence

论文作者

Arponen, Heikki, Bishop, Tom E.

论文摘要

学习哈希是从大规模数据库中精确且近似最近的邻居搜索的有效范式。二进制哈希代码通常是通过从CNN中填充输出功能来从图像中提取的，CNN受到监督二进制相似/不同任务的训练。这种方法的缺点是：（i）导致的代码不一定捕获输入数据的语义相似性（ii）舍入导致信息丢失导致信息丢失，表现为降低的检索性能，并且（iii）仅使用类似阶级的相似性作为目标可以导致琐碎的解决方案，从而导致分类器输出，而不是编码更多的复杂关系，而不是通过大多数触发性能进行编码。我们通过新的损失函数来克服（i），鼓励学到的特征的相对哈希代码距离与他们的目标衍生出的相对距离。我们通过对网络输出和二进制目标分布之间的KL差异的可区分估计来解决（ii），当特征四舍五入到二进制时，信息损失最少。最后，我们通过关注分层精度度量来解决（III）。使用从WordNet标签层次结构或句子嵌入的CIFAR-100，ImageNet和概念标题数据集上，通过语义图像检索进行语义图像检索来证明该方法的效率。

Learning to hash is an efficient paradigm for exact and approximate nearest neighbor search from massive databases. Binary hash codes are typically extracted from an image by rounding output features from a CNN, which is trained on a supervised binary similar/ dissimilar task. Drawbacks of this approach are: (i) resulting codes do not necessarily capture semantic similarity of the input data (ii) rounding results in information loss, manifesting as decreased retrieval performance and (iii) Using only class-wise similarity as a target can lead to trivial solutions, simply encoding classifier outputs rather than learning more intricate relations, which is not detected by most performance metrics. We overcome (i) via a novel loss function encouraging the relative hash code distances of learned features to match those derived from their targets. We address (ii) via a differentiable estimate of the KL divergence between network outputs and a binary target distribution, resulting in minimal information loss when the features are rounded to binary. Finally, we resolve (iii) by focusing on a hierarchical precision metric. Efficiency of the methods is demonstrated with semantic image retrieval on the CIFAR-100, ImageNet and Conceptual Captions datasets, using similarities inferred from the WordNet label hierarchy or sentence embeddings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题