论文标题

对高维和异质性案例推理的深入散列

Supervised Deep Hashing for High-dimensional and Heterogeneous Case-based Reasoning

论文作者

Zhang, Qi, Hu, Liang, Shi, Chongyang, Liu, Ke, Cao, Longbing

论文摘要

基于案例的推理(CBR)在高维和异质数据上是现实世界中的趋势但充满挑战和计算昂贵的任务。一种有前途的方法是获得代表案例的低维哈希码,并在锤式空间中进行相似性检索。但是,基于数据独立的哈希的先前方法依赖于随机预测或手动构造,由于对数据特征不敏感,因此无法解决特定数据问题(例如,高差异性和异质性)。为了解决这些问题,这项工作介绍了一个新颖的深层哈希网络,以学习具有有效案例检索的相似性紧凑型哈希码,并提出了一个深度障碍的CBR模型HECBR。具体而言,我们引入嵌入位置以表示异质特征,并利用多线性相互作用层获得病例嵌入,从而有效地过滤了零价值的特征,以解决高差异性和稀疏性并捕获相互作用的耦合。然后,我们将情况嵌入到完全连接的层中,随后,哈希层生成带有量化正常化程序的哈希代码,以控制放松期间的量化损失。为了满足CBR的增量学习,我们进一步提出了一种自适应学习策略来更新哈希功能。公共数据集的广泛实验表明,HECBR大大降低了存储空间,并大大加速了病例检索。与最先进的CBR方法相比,HECBR实现了理想的性能,并且在分类中基于哈希的CBR方法的性能要好得多。

Case-based Reasoning (CBR) on high-dimensional and heterogeneous data is a trending yet challenging and computationally expensive task in the real world. A promising approach is to obtain low-dimensional hash codes representing cases and perform a similarity retrieval of cases in Hamming space. However, previous methods based on data-independent hashing rely on random projections or manual construction, inapplicable to address specific data issues (e.g., high-dimensionality and heterogeneity) due to their insensitivity to data characteristics. To address these issues, this work introduces a novel deep hashing network to learn similarity-preserving compact hash codes for efficient case retrieval and proposes a deep-hashing-enabled CBR model HeCBR. Specifically, we introduce position embedding to represent heterogeneous features and utilize a multilinear interaction layer to obtain case embeddings, which effectively filtrates zero-valued features to tackle high-dimensionality and sparsity and captures inter-feature couplings. Then, we feed the case embeddings into fully-connected layers, and subsequently a hash layer generates hash codes with a quantization regularizer to control the quantization loss during relaxation. To cater to incremental learning of CBR, we further propose an adaptive learning strategy to update the hash function. Extensive experiments on public datasets show that HeCBR greatly reduces storage and significantly accelerates case retrieval. HeCBR achieves desirable performance compared with the state-of-the-art CBR methods and performs significantly better than hashing-based CBR methods in classification.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源