论文标题
超越网格:基于弱监督的生物医学文献的细粒语义索引
Beyond MeSH: Fine-Grained Semantic Indexing of Biomedical Literature based on Weak Supervision
论文作者
论文摘要
在这项工作中,我们提出了一种在概念层面上对生物医学文献中受试者注释自动改进的方法。 Medline/PubMed中生物医学文章的语义索引和搜索是基于带有网格描述符的语义主题注释,可能对应于几个相关但不同的生物医学概念。这样的语义注释不符合域知识中可用的细节级别,也可能不足以满足域中专家的信息需求。为此,我们提出了一种新方法,该方法使用弱监督来培训有关特定疾病的文献的概念注释者。我们对两种疾病的网格描述源进行了测试:阿尔茨海默氏病和杜钦肌营养不良。结果表明,概念出现是对自动化主题注释细化的强有力的启发式启发式,其用作较弱的监督可以导致改善的概念级注释。细粒度的语义注释可以实现更精确的文献检索,维持主题注释与其他领域资源的语义整合,并减轻维持一致的主题注释,因为随着时间的推移,在网状词库中添加了新的更详细的条目。
In this work, we propose a method for the automated refinement of subject annotations in biomedical literature at the level of concepts. Semantic indexing and search of biomedical articles in MEDLINE/PubMed are based on semantic subject annotations with MeSH descriptors that may correspond to several related but distinct biomedical concepts. Such semantic annotations do not adhere to the level of detail available in the domain knowledge and may not be sufficient to fulfil the information needs of experts in the domain. To this end, we propose a new method that uses weak supervision to train a concept annotator on the literature available for a particular disease. We test this method on the MeSH descriptors for two diseases: Alzheimer's Disease and Duchenne Muscular Dystrophy. The results indicate that concept-occurrence is a strong heuristic for automated subject annotation refinement and its use as weak supervision can lead to improved concept-level annotations. The fine-grained semantic annotations can enable more precise literature retrieval, sustain the semantic integration of subject annotations with other domain resources and ease the maintenance of consistent subject annotations, as new more detailed entries are added in the MeSH thesaurus over time.