论文标题

利用认知搜索模式来增强自动自然语言检索表现

Leveraging Cognitive Search Patterns to Enhance Automated Natural Language Retrieval Performance

论文作者

Selvaretnam, Bhawani, Belkhatir, Mohammed

论文摘要

大型文本存储库中对信息的搜索困扰着所谓的文档 - 问题词汇鸿沟,即一方面存储的文档实体中的内容与人类的查询另一方面的语义不一致。在过去的二十年中,大量作品已经提高了技术检索能力,而几项研究揭示了与人类搜索行为有关的问题。我们认为,这些努力应结合在一起,从某种意义上说,自动检索系统必须完全模仿人类的搜索行为,因此请考虑使用该程序,根据该过程,用户可以逐步增强其初始查询。为此,强调了模仿用户搜索行为的认知重新制定模式,并在统计上与检索过程中采用的原始术语相关或词汇相关。我们通过考虑查询概念表示形式并引入一组操作,从而对这些模式进行正式应用,从而可以对初始查询进行修改。基于遗传算法的加权过程允许根据其概念性角色类型来强调术语。进行了针对相关性,语言,概念和基于知识的模型的实验评估。与语言和相关模型相比,我们还显示了比平均平均精度更好的性能,而不是基于单词嵌入的模型实例化。

The search of information in large text repositories has been plagued by the so-called document-query vocabulary gap, i.e. the semantic discordance between the contents in the stored document entities on the one hand and the human query on the other hand. Over the past two decades, a significant body of works has advanced technical retrieval prowess while several studies have shed light on issues pertaining to human search behavior. We believe that these efforts should be conjoined, in the sense that automated retrieval systems have to fully emulate human search behavior and thus consider the procedure according to which users incrementally enhance their initial query. To this end, cognitive reformulation patterns that mimic user search behaviour are highlighted and enhancement terms which are statistically collocated with or lexical-semantically related to the original terms adopted in the retrieval process. We formalize the application of these patterns by considering a query conceptual representation and introducing a set of operations allowing to operate modifications on the initial query. A genetic algorithm-based weighting process allows placing emphasis on terms according to their conceptual role-type. An experimental evaluation on real-world datasets against relevance, language, conceptual and knowledge-based models is conducted. We also show, when compared to language and relevance models, a better performance in terms of mean average precision than a word embedding-based model instantiation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源