德国语料库，用于命名的实体识别和交通和行业活动的关系提取

论文标题

德国语料库，用于命名的实体识别和交通和行业活动的关系提取

A German Corpus for Fine-Grained Named Entity Recognition and Relation Extraction of Traffic and Industry Events

论文作者

Schiersch, Martin, Mironova, Veselina, Schmitt, Maximilian, Thomas, Philippe, Gabryszak, Aleksandra, Hennig, Leonhard

论文摘要

在个人旅行计划和供应链管理等领域，监视移动性和与行业相关的事件很重要，但是从异质，高量的高量文本流中提取与特定公司，过境路线和位置有关的活动仍然是一个重大挑战。这项工作描述了德语文档的语料库，这些文件已注明了街道，停车场和路线等细粒度的地理本体以及标准命名的实体类型。还通过15种与交通和行业有关的N-美联社和事件进行注释，例如事故，交通拥堵，收购和罢工。该语料库由广播电台，警察和铁路公司的新闻文字，Twitter消息和交通报告组成。它允许培训和评估两个命名实体识别算法，旨在旨在进行地理主体的细粒度键入以及N- ARY关系提取系统。

Monitoring mobility- and industry-relevant events is important in areas such as personal travel planning and supply chain management, but extracting events pertaining to specific companies, transit routes and locations from heterogeneous, high-volume text streams remains a significant challenge. This work describes a corpus of German-language documents which has been annotated with fine-grained geo-entities, such as streets, stops and routes, as well as standard named entity types. It has also been annotated with a set of 15 traffic- and industry-related n-ary relations and events, such as accidents, traffic jams, acquisitions, and strikes. The corpus consists of newswire texts, Twitter messages, and traffic reports from radio stations, police and railway companies. It allows for training and evaluating both named entity recognition algorithms that aim for fine-grained typing of geo-entities, as well as n-ary relation extraction systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题