从患者文本中识别医学症状：长尾多标签分布的主动学习方法

论文标题

从患者文本中识别医学症状：长尾多标签分布的主动学习方法

Medical symptom recognition from patient text: An active learning approach for long-tailed multilabel distributions

论文作者

Mottaghi, Ali, Sarma, Prathusha K, Amatriain, Xavier, Yeung, Serena, Kannan, Anitha

论文摘要

我们研究患者文本的医学症状识别问题，目的是从患者那里收集相关信息（称为历史记录）。典型的患者文本通常是对患者所经历的症状的描述，并且这种文本的单个实例可以用多种症状“标记”。由于i）缺乏大量注释数据以及ii）单个文本可以映射到的多种症状的大型宇宙。此外，患者文本通常以数据中的长尾巴为特征（即，某些标签/症状比其他标签/症状更频繁地发生，例如“发烧”与“ Hematochezia”）。在本文中，我们介绍了一种主动学习方法，该方法利用了不断完善的潜在空间的基础结构，以选择最有用的标签示例。这可以选择最有用的示例，尽管数据分布的尾巴很长，但通过学习模型逐渐增加了症状宇宙的覆盖范围。

We study the problem of medical symptoms recognition from patient text, for the purposes of gathering pertinent information from the patient (known as history-taking). A typical patient text is often descriptive of the symptoms the patient is experiencing and a single instance of such a text can be "labeled" with multiple symptoms. This makes learning a medical symptoms recognizer challenging on account of i) the lack of availability of voluminous annotated data as well as ii) the large unknown universe of multiple symptoms that a single text can map to. Furthermore, patient text is often characterized by a long tail in the data (i.e., some labels/symptoms occur more frequently than others for e.g "fever" vs "hematochezia"). In this paper, we introduce an active learning method that leverages underlying structure of a continually refined, learned latent space to select the most informative examples to label. This enables the selection of the most informative examples that progressively increases the coverage on the universe of symptoms via the learned model, despite the long tail in data distribution.

下载PDF全文

下载文献需遵守相关版权规定

论文标题