论文标题
了解临床试验报告:提取医疗实体及其关系
Understanding Clinical Trial Reports: Extracting Medical Entities and Their Relations
论文作者
论文摘要
有关比较治疗有效性的最佳证据来自临床试验,其结果在非结构化文章中报告。医学专家必须手动从文章中提取信息,以告知决策,这既耗时又昂贵。在这里,我们考虑(a)从描述临床试验(实体识别)的全文文章中提取治疗和结果的端到端任务,以及(b)推断前者对后者的报告结果(关系提取)。我们介绍了此任务的新数据,并评估了最近在自然语言处理中类似任务上实现最新结果的模型。然后,我们提出了一种新方法,该方法通常是如何提出试验结果,以优于这些纯粹数据驱动的基准。最后,我们通过非营利组织对模型进行了实地评估,以寻求识别可能被重新构成癌症的现有药物,以显示端到端证据提取系统的潜在实用性。
The best evidence concerning comparative treatment effectiveness comes from clinical trials, the results of which are reported in unstructured articles. Medical experts must manually extract information from articles to inform decision-making, which is time-consuming and expensive. Here we consider the end-to-end task of both (a) extracting treatments and outcomes from full-text articles describing clinical trials (entity identification) and, (b) inferring the reported results for the former with respect to the latter (relation extraction). We introduce new data for this task, and evaluate models that have recently achieved state-of-the-art results on similar tasks in Natural Language Processing. We then propose a new method motivated by how trial results are typically presented that outperforms these purely data-driven baselines. Finally, we run a fielded evaluation of the model with a non-profit seeking to identify existing drugs that might be re-purposed for cancer, showing the potential utility of end-to-end evidence extraction systems.