临床阅读理解：对EMRQA数据集的详尽分析

论文标题

临床阅读理解：对EMRQA数据集的详尽分析

Clinical Reading Comprehension: A Thorough Analysis of the emrQA Dataset

论文作者

Yue, Xiang, Gutierrez, Bernal Jimenez, Sun, Huan

论文摘要

由于大规模注释的数据集，近年来机器阅读理解取得了长足的进步。但是，在临床领域，由于注释所需的领域专业知识，创建此类数据集非常困难。最近，Pampari等。（EMNLP'18）通过使用专家注释的问题模板和现有的I2B2注释来解决此问题，以创建EMRQA，这是基于临床注释的第一个大规模数据集（QA）。在本文中，我们对该数据集和临床阅读理解（Clinirc）任务提供了深入的分析。从我们的定性分析中，我们发现（i）EMRQA答案通常不完整，并且（ii）不使用域知识的情况下，EMRQA问题通常是可以回答的。从我们的定量实验中，令人惊讶的结果包括（iii）使用小型采样子集（5％-20％），与在整个数据集中训练的模型相比，我们可以获得大致相等的性能，（iv）此性能与人类专家的性能接近，并且（v）Bert模型并不能超过最佳性能基础模型。经过对EMRQA的分析，我们进一步探讨了Clinirc Systems的两个理想方面：利用临床领域知识并推广到看不见的问题和环境的能力。我们认为，在创建未来数据集时都应考虑两者。

Machine reading comprehension has made great progress in recent years owing to large-scale annotated datasets. In the clinical domain, however, creating such datasets is quite difficult due to the domain expertise required for annotation. Recently, Pampari et al. (EMNLP'18) tackled this issue by using expert-annotated question templates and existing i2b2 annotations to create emrQA, the first large-scale dataset for question answering (QA) based on clinical notes. In this paper, we provide an in-depth analysis of this dataset and the clinical reading comprehension (CliniRC) task. From our qualitative analysis, we find that (i) emrQA answers are often incomplete, and (ii) emrQA questions are often answerable without using domain knowledge. From our quantitative experiments, surprising results include that (iii) using a small sampled subset (5%-20%), we can obtain roughly equal performance compared to the model trained on the entire dataset, (iv) this performance is close to human expert's performance, and (v) BERT models do not beat the best performing base model. Following our analysis of the emrQA, we further explore two desired aspects of CliniRC systems: the ability to utilize clinical domain knowledge and to generalize to unseen questions and contexts. We argue that both should be considered when creating future datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题