论文标题
BiomRC:生物医学机器阅读理解的数据集
BIOMRC: A Dataset for Biomedical Machine Reading Comprehension
论文作者
论文摘要
我们介绍了BiomRC,这是一种大规模的披风风格的生物医学MRC数据集。与Pappas等人之前的Bioread数据集相比,请注意降低噪声。 (2018)。实验表明,简单的启发式方法在新数据集上表现不佳,并且在Bioread上测试的两个神经MRC模型在BIOMRC上的表现要好得多,这表明新数据集的确不那么嘈杂,或者至少其任务更可行。与Bioread相比,新数据集的非专家人类表现也更高,而生物医学专家的表现甚至更好。我们还引入了一种新的基于BERT的MRC模型,最佳版本的最佳版本大大优于测试的所有其他方法,在某些实验中达到或超过生物医学专家的准确性。我们使新数据集分为三种不同,也可以发布我们的代码,并提供排行榜。
We introduce BIOMRC, a large-scale cloze-style biomedical MRC dataset. Care was taken to reduce noise, compared to the previous BIOREAD dataset of Pappas et al. (2018). Experiments show that simple heuristics do not perform well on the new dataset, and that two neural MRC models that had been tested on BIOREAD perform much better on BIOMRC, indicating that the new dataset is indeed less noisy or at least that its task is more feasible. Non-expert human performance is also higher on the new dataset compared to BIOREAD, and biomedical experts perform even better. We also introduce a new BERT-based MRC model, the best version of which substantially outperforms all other methods tested, reaching or surpassing the accuracy of biomedical experts in some experiments. We make the new dataset available in three different sizes, also releasing our code, and providing a leaderboard.