论文标题
中国语料库用于细粒实体打字
A Chinese Corpus for Fine-grained Entity Typing
论文作者
论文摘要
通过广泛的应用程序,细粒度实体键入是一项具有挑战性的任务。但是,此任务的大多数现有数据集用英语。在本文中,我们引入了一个用于中国细颗粒实体的语料库,其中包含4,800个通过众包手动标记的提及。每个提及都用自由形式实体类型注释。为了使我们的数据集在更多可能的方案中有用,我们还将所有细粒类型分类为10种常规类型。最后,我们使用一些神经模型进行实验,这些神经模型的结构在细粒度的实体键入中是典型的,并显示了它们在我们的数据集上的表现。我们还展示了通过跨语性转移学习来改善中国细粒实体打字的可能性。
Fine-grained entity typing is a challenging task with wide applications. However, most existing datasets for this task are in English. In this paper, we introduce a corpus for Chinese fine-grained entity typing that contains 4,800 mentions manually labeled through crowdsourcing. Each mention is annotated with free-form entity types. To make our dataset useful in more possible scenarios, we also categorize all the fine-grained types into 10 general types. Finally, we conduct experiments with some neural models whose structures are typical in fine-grained entity typing and show how well they perform on our dataset. We also show the possibility of improving Chinese fine-grained entity typing through cross-lingual transfer learning.