编码器：知识注入的跨语性医学术语嵌入以进行术语归一化

论文标题

编码器：知识注入的跨语性医学术语嵌入以进行术语归一化

CODER: Knowledge infused cross-lingual medical term embedding for term normalization

论文作者

Yuan, Zheng, Zhao, Zhengyun, Sun, Haixia, Li, Jiao, Wang, Fei, Yu, Sheng

论文摘要

本文提出了编码员：在知识图上进行跨语性医学术语表示的对比度学习。编码器是为医学术语标准化而设计的，它通过为代表具有跨语性支持的相同或相似医学概念的不同术语提供近距向量表示。我们通过对统一医学语言系统的医学知识图（KG）上的对比度学习来培训编码员，在该系统中，使用KG的术语和关系三胞胎来计算相似之处。通过关系培训将医学知识注射到嵌入中，并旨在提供潜在的更好的机器学习功能。我们以零句术语归一化，语义相似性和关系分类为基准评估编码器，这些基准表明CodeRoutPerform可以使用各种最新的生物医学嵌入，概念嵌入和上下文嵌入。我们的代码和模型可在https://github.com/ganjinzero/coder上找到。

This paper proposes CODER: contrastive learning on knowledge graphs for cross-lingual medical term representation. CODER is designed for medical term normalization by providing close vector representations for different terms that represent the same or similar medical concepts with cross-lingual support. We train CODER via contrastive learning on a medical knowledge graph (KG) named the Unified Medical Language System, where similarities are calculated utilizing both terms and relation triplets from KG. Training with relations injects medical knowledge into embeddings and aims to provide potentially better machine learning features. We evaluate CODER in zero-shot term normalization, semantic similarity, and relation classification benchmarks, which show that CODERoutperforms various state-of-the-art biomedical word embedding, concept embeddings, and contextual embeddings. Our codes and models are available at https://github.com/GanjinZero/CODER.

下载PDF全文

下载文献需遵守相关版权规定

论文标题