论文标题

使用增强的手套单词嵌入丰富消费者健康词汇

Enriching Consumer Health Vocabulary Using Enhanced GloVe Word Embedding

论文作者

Ibrahim, Mohammed, Gauch, Susan, Salman, Omar, Alqahatani, Mohammed

论文摘要

开放式和协作消费者健康词汇量(OAC CHV,简称CHV)是用普通英语编写的医学术语的集合。它提供了一个简单,简单且清晰的术语列表,外行人更喜欢使用,而不是同等的专业医学术语。国家医学图书馆(NLM)已将CHV术语整合到其统一的医学语言系统(UMLS)。这些CHV术语映射到UMLS上的56000个专业概念。我们发现,这些外行的条款中约有48%仍然是行话,并且与UMLS上的专业条款相匹配。在本文中,我们提出了一种增强的单词嵌入技术,该技术从消费者生成的文本中生成新的CHV术语。我们从医疗保健社交媒体下载了语料库,并根据迭代反馈评估了我们的新方法,该方法是使用现有CHV术语构建的基础真理对单词嵌入的。我们的反馈算法的表现优于未修饰的手套,并且已检测到新的CHV术语。

Open-Access and Collaborative Consumer Health Vocabulary (OAC CHV, or CHV for short), is a collection of medical terms written in plain English. It provides a list of simple, easy, and clear terms that laymen prefer to use rather than an equivalent professional medical term. The National Library of Medicine (NLM) has integrated and mapped the CHV terms to their Unified Medical Language System (UMLS). These CHV terms mapped to 56000 professional concepts on the UMLS. We found that about 48% of these laymen's terms are still jargon and matched with the professional terms on the UMLS. In this paper, we present an enhanced word embedding technique that generates new CHV terms from a consumer-generated text. We downloaded our corpus from a healthcare social media and evaluated our new method based on iterative feedback to word embedding using ground truth built from the existing CHV terms. Our feedback algorithm outperformed unmodified GLoVe and new CHV terms have been detected.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源