论文标题

Semclinbr-葡萄牙临床NLP任务的多机构和多种特殊语义注释的语料库

SemClinBr -- a multi institutional and multi specialty semantically annotated corpus for Portuguese clinical NLP tasks

论文作者

Oliveira, Lucas Emanuel Silva e, Peters, Ana Carolina, da Silva, Adalniza Moura Pucca, Gebeluca, Caroline P., Gumiel, Yohan Bonescki, Cintho, Lilian Mie Mukai, Carvalho, Deborah Ribeiro, Hasan, Sadid A., Moro, Claudia Maria Cabral

论文摘要

大量的研究重点是从电子健康记录(EHR)中提取患者的信息,导致对注释语料库的需求增加,这是对自然语言处理(NLP)算法的开发和评估的非常宝贵的资源。在英语范围之外,尤其是在巴西葡萄牙语的范围之外,没有多功能临床语料库的缺乏,这在生物医学NLP领域的科学进步严重影响。在这项研究中,我们使用来自多种医学专业,文档类型和机构的临床文本开发了语义注释的语料库。我们提供以下内容:(1)一项调查列出了从先前的研究中汲取的共同方面和经验教训,(2)可以复制的细粒注释模式,并指导其他注释计划,(3)基于Web的注释工具,重点介绍注释建议,以及(4)固有的和额外的注释评估。这项工作的结果是Semclinbr,该语料库具有1,000个临床笔记,标有65,117个实体和11,263个关系,并且可以支持各种临床NLP任务,并提高EHR对葡萄牙语的次要使用。

The high volume of research focusing on extracting patient's information from electronic health records (EHR) has led to an increase in the demand for annotated corpora, which are a very valuable resource for both the development and evaluation of natural language processing (NLP) algorithms. The absence of a multi-purpose clinical corpus outside the scope of the English language, especially in Brazilian Portuguese, is glaring and severely impacts scientific progress in the biomedical NLP field. In this study, we developed a semantically annotated corpus using clinical texts from multiple medical specialties, document types, and institutions. We present the following: (1) a survey listing common aspects and lessons learned from previous research, (2) a fine-grained annotation schema which could be replicated and guide other annotation initiatives, (3) a web-based annotation tool focusing on an annotation suggestion feature, and (4) both intrinsic and extrinsic evaluation of the annotations. The result of this work is the SemClinBr, a corpus that has 1,000 clinical notes, labeled with 65,117 entities and 11,263 relations, and can support a variety of clinical NLP tasks and boost the EHR's secondary use for the Portuguese language.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源