论文标题

通过知识增强的多渠道CNN的基因组变异出版物的文献分类

Literature Triage on Genomic Variation Publications by Knowledge-enhanced Multi-channel CNN

论文作者

Lv, Chenhui, Lu, Qian, Zhang, Xiang

论文摘要

背景:为了研究基因组变异与某些疾病或表型之间的相关性,基本任务是筛选出大量文献中有关出版物的介绍,这称为文学分类。创建了一些知识库,包括Uniprotkb/Swiss-Prot和NHGri-Ebi GWAS目录,用于收集有关出版物的收集。这些出版物是由专家手动策划的,这很耗时。此外,由于出版物的数量迅速增加,文献中信息的手动策划无法扩展。为了降低文学分流的成本,采用机器学习模型自动识别生物医学出版物。方法:与先前利用机器学习模型进行文学分类的研究相比,我们采用多渠道卷积网络来利用丰富的文本信息,同时桥接了来自不同语料库的语义差距。此外,从UMLS中学到的知识嵌入也用于在分类过程中提供额外的医学知识。结果:我们证明,借助知识嵌入和多个渠道,我们的模型在5个数据集上优于5个数据集的最先进模型。我们的模型提高了生物医学文献分类结果的准确性。结论:多个渠道和知识嵌入在生物医学文献分类任务中增强了CNN模型的性能。关键词:文学分流;知识嵌入;多通道卷积网络

Background: To investigate the correlation between genomic variation and certain diseases or phenotypes, the fundamental task is to screen out the concerning publications from massive literature, which is called literature triage. Some knowledge bases, including UniProtKB/Swiss-Prot and NHGRI-EBI GWAS Catalog are created for collecting concerning publications. These publications are manually curated by experts, which is time-consuming. Moreover, the manual curation of information from literature is not scalable due to the rapidly increasing amount of publications. In order to cut down the cost of literature triage, machine-learning models were adopted to automatically identify biomedical publications. Methods: Comparing to previous studies utilizing machine-learning models for literature triage, we adopt a multi-channel convolutional network to utilize rich textual information and meanwhile bridge the semantic gaps from different corpora. In addition, knowledge embeddings learned from UMLS is also used to provide extra medical knowledge beyond textual features in the process of triage. Results: We demonstrate that our model outperforms the state-of-the-art models over 5 datasets with the help of knowledge embedding and multiple channels. Our model improves the accuracy of biomedical literature triage results. Conclusions: Multiple channels and knowledge embeddings enhance the performance of the CNN model in the task of biomedical literature triage. Keywords: Literature Triage; Knowledge Embedding; Multi-channel Convolutional Network

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源