论文标题

从域文本中自动提取农业术语:工具和技术的调查

Automatic Extraction of Agriculture Terms from Domain Text: A Survey of Tools and Techniques

论文作者

Chatterjee, Niladri, Kaushik, Neha

论文摘要

农业是任何国家发展的关键组成部分。特定于领域的知识资源有助于深入了解该领域。域专家开发和维护了现有的知识资源,例如Agrovoc和Nal thesaurus。这些知识资源的术语人口可以通过使用自动术语提取工具来处理非结构化农业文本来自动化。自动术语提取也是许多语义Web应用程序中的关键组成部分,例如创建本体论,推荐系统,情感分类,查询扩展等。自动术语提取系统的主要目标是最大化有效术语的数量,并最大程度地减少从文档输入集中提取的无效项的数量。尽管在各种应用中都重要,但在线工具的上述目的的可用性是相当有限的。此外,其中最受欢迎的表现差异很大。结果,选择正确的术语提取工具被认为是不同基于知识的应用程序的严重问题。本文介绍了三种常用的术语提取工具的分析,即。 Rake,Termine,Termraider,并在精确和召回,Vis-A-Vis租金方面进行比较,这是这些作者为农业领域开发的最新术语提取器。

Agriculture is a key component in any country's development. Domain-specific knowledge resources serve to gain insight into the domain. Existing knowledge resources such as AGROVOC and NAL Thesaurus are developed and maintained by the domain experts. Population of terms into these knowledge resources can be automated by using automatic term extraction tools for processing unstructured agricultural text. Automatic term extraction is also a key component in many semantic web applications, such as ontology creation, recommendation systems, sentiment classification, query expansion among others. The primary goal of an automatic term extraction system is to maximize the number of valid terms and minimize the number of invalid terms extracted from the input set of documents. Despite its importance in various applications, the availability of online tools for the said purpose is rather limited. Moreover, the performance of the most popular ones among them varies significantly. As a consequence, selection of the right term extraction tool is perceived as a serious problem for different knowledge-based applications. This paper presents an analysis of three commonly used term extraction tools, viz. RAKE, TerMine, TermRaider and compares their performance in terms of precision and recall, vis-a-vis RENT, a more recent term extractor developed by these authors for agriculture domain.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源