论文标题
从生物医学文献中提取精确肿瘤学的概念
Extracting Concepts for Precision Oncology from the Biomedical Literature
论文作者
论文摘要
本文介绍了一种初始数据集和自动自然语言处理(NLP)方法,用于从生物医学研究文章中提取与精度肿瘤学有关的概念。我们提取五种概念类型:癌症,突变,种群,治疗,结果。按照标准的双重警告程序,用这些概念注释了250个生物医学摘要的语料库。然后,我们尝试基于BERT的模型进行概念提取。表现最佳的模型的精度为63.8%,召回71.9%,F1的精度为67.1。最后,我们提出了用于改善提取性能并利用NLP系统在下游精度肿瘤应用程序中的其他方向。
This paper describes an initial dataset and automatic natural language processing (NLP) method for extracting concepts related to precision oncology from biomedical research articles. We extract five concept types: Cancer, Mutation, Population, Treatment, Outcome. A corpus of 250 biomedical abstracts were annotated with these concepts following standard double-annotation procedures. We then experiment with BERT-based models for concept extraction. The best-performing model achieved a precision of 63.8%, a recall of 71.9%, and an F1 of 67.1. Finally, we propose additional directions for research for improving extraction performance and utilizing the NLP system in downstream precision oncology applications.