论文标题
远距离监督的生物医学关系提取的数据驱动方法降低降噪方法
A Data-driven Approach for Noise Reduction in Distantly Supervised Biomedical Relation Extraction
论文作者
论文摘要
事实三元是生物医学领域内使用的结构化知识的一种常见形式。随着非结构化科学文本的数量不断增长,这些文本的手动注释要提取的任务越来越昂贵。远处的监督提供了一种可行的方法来对抗这一目标,通过快速产生大量标记但嘈杂的数据。我们旨在通过将富含实体的关系分类BERT模型扩展到多个实例学习的问题,并定义一个简单的数据编码方案,从而大大降低噪声,达到遥远的生物医学关系提取的最新性能,从而减少这种噪声。我们的方法进一步编码了有关关系方向的知识,从而通过减少噪音并减少知识图完成的联合学习需求,从而增加对关系学习的关注。
Fact triples are a common form of structured knowledge used within the biomedical domain. As the amount of unstructured scientific texts continues to grow, manual annotation of these texts for the task of relation extraction becomes increasingly expensive. Distant supervision offers a viable approach to combat this by quickly producing large amounts of labeled, but considerably noisy, data. We aim to reduce such noise by extending an entity-enriched relation classification BERT model to the problem of multiple instance learning, and defining a simple data encoding scheme that significantly reduces noise, reaching state-of-the-art performance for distantly-supervised biomedical relation extraction. Our approach further encodes knowledge about the direction of relation triples, allowing for increased focus on relation learning by reducing noise and alleviating the need for joint learning with knowledge graph completion.