论文标题

命名实体识别和商业产品的关系提取的语料库研究和注释模式

A Corpus Study and Annotation Schema for Named Entity Recognition and Relation Extraction of Business Products

论文作者

Schön, Saskia, Mironova, Veselina, Gabryszak, Aleksandra, Hennig, Leonhard

论文摘要

在新闻和论坛文本中认识到非标准的实体类型和关系,例如B2B产品,产品类及其生产者,在供应链监测和市场研究等应用领域中很重要。但是,在该领域中,确定缺乏注释的语料库和注释指南。在这项工作中,我们介绍了一项语料库研究,注释模式和相关指南,以介绍产品实体和公司产品关系的注释。我们发现,尽管产品提及通常是名词短语,但由于高边界歧义以及其表面实现的广泛句法和语义变化,因此很难定义它们的确切程度。我们还描述了我们正在进行的注释工作,并根据拟议的指南提供了注释的英语网络和社交媒体文件的初步语料库。

Recognizing non-standard entity types and relations, such as B2B products, product classes and their producers, in news and forum texts is important in application areas such as supply chain monitoring and market research. However, there is a decided lack of annotated corpora and annotation guidelines in this domain. In this work, we present a corpus study, an annotation schema and associated guidelines, for the annotation of product entity and company-product relation mentions. We find that although product mentions are often realized as noun phrases, defining their exact extent is difficult due to high boundary ambiguity and the broad syntactic and semantic variety of their surface realizations. We also describe our ongoing annotation effort, and present a preliminary corpus of English web and social media documents annotated according to the proposed guidelines.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源