CCS Explorer：相关性预测，提取性摘要和临床队列研究中指定的实体识别

论文标题

CCS Explorer：相关性预测，提取性摘要和临床队列研究中指定的实体识别

CCS Explorer: Relevance Prediction, Extractive Summarization, and Named Entity Recognition from Clinical Cohort Studies

论文作者

Al-Hussaini, Irfan, An, Davi Nakajima, Lee, Albert J., Bi, Sarah, Mitchell, Cassie S.

论文摘要

临床队列研究（CCS），例如随机临床试验，是临床研究的重要来源。理想情况下，临床专家对这些文章进行了检查，以进行探索性分析，从用于评估现有药物解决新兴疾病的功效的药物发现到新开发的药物的首次测试。但是，每天在PubMed中发布了100多种文章，每天发表有关Covid-19的单一流行疾病。结果，医生可能需要几天的时间才能找到文章并提取相关信息。我们可以开发一个系统以更快地筛选这些文章的长列表并记录这些文章的关键收获吗？在这项工作中，我们提出了CCS Explorer，这是一种端到端系统，用于句子，提取性摘要以及患者，结果和干预实体检测的相关性预测。 CCS Explorer包装在基于Web的图形用户界面中，用户可以提供任何疾病名称。然后，CCS Explorer根据在后端产生的自动生成的查询结果提取并汇总了PubMed文章中的所有相关信息。对于每个任务，CCS Explorer微型训练语言表示模型基于具有其他层的变压器。使用两个公开可用数据集评估模型。 CCS Explorer获得了80.2％的召回率，AUC-ROC为0.843，使用Biobert的句子相关性预测准确性为88.3％，并在患者，干预，干预，结果检测（PIO）（PIO）的平均Micro F1分数（PIO）中达到了77.8％。因此，CCS Explorer可以可靠地提取相关信息来汇总文章，从而节省了$ \ sim \ text {660} \ times $。

Clinical Cohort Studies (CCS), such as randomized clinical trials, are a great source of documented clinical research. Ideally, a clinical expert inspects these articles for exploratory analysis ranging from drug discovery for evaluating the efficacy of existing drugs in tackling emerging diseases to the first test of newly developed drugs. However, more than 100 articles are published daily on a single prevalent disease like COVID-19 in PubMed. As a result, it can take days for a physician to find articles and extract relevant information. Can we develop a system to sift through the long list of these articles faster and document the crucial takeaways from each of these articles? In this work, we propose CCS Explorer, an end-to-end system for relevance prediction of sentences, extractive summarization, and patient, outcome, and intervention entity detection from CCS. CCS Explorer is packaged in a web-based graphical user interface where the user can provide any disease name. CCS Explorer then extracts and aggregates all relevant information from articles on PubMed based on the results of an automatically generated query produced on the back-end. For each task, CCS Explorer fine-tunes pre-trained language representation models based on transformers with additional layers. The models are evaluated using two publicly available datasets. CCS Explorer obtains a recall of 80.2%, AUC-ROC of 0.843, and an accuracy of 88.3% on sentence relevance prediction using BioBERT and achieves an average Micro F1-Score of 77.8% on Patient, Intervention, Outcome detection (PIO) using PubMedBERT. Thus, CCS Explorer can reliably extract relevant information to summarize articles, saving time by $\sim \text{660}\times$.

下载PDF全文

下载文献需遵守相关版权规定

论文标题