通过风险估计，生物医学假设产生的时间正面未标记的学习

论文标题

通过风险估计，生物医学假设产生的时间正面未标记的学习

Temporal Positive-unlabeled Learning for Biomedical Hypothesis Generation via Risk Estimation

论文作者

Akujuobi, Uchenna, Chen, Jun, Elhoseiny, Mohamed, Spranger, Michael, Zhang, Xiangliang

论文摘要

了解病毒，药物和症状等生物医学术语之间的关系对于打击疾病至关重要。已经尝试将机器学习的使用引入假设产生的科学过程（HG），这是指生物医学术语之间有意义的隐式联系。但是，大多数现有的方法无法真正捕获科学术语关系的时间动态，也假设未观察到的连接是无关紧要的（即在积极的（PN）学习设置中）。为了打破这些限制，我们将此HG问题提出为未来的连接性预测任务，该任务是通过正标（PU）学习的动态归因图。然后，关键是从正面和未标记的数据中捕获节点对（项对）关系的时间演变。我们提出了一个变异推理模型来估计阳性先验，并将其纳入学习对嵌入的学习中，然后将其用于链接预测。实验结果对现实世界的生物医学术语关系数据集以及对COVID-19数据集的案例研究验证了所提出模型的有效性。

Understanding the relationships between biomedical terms like viruses, drugs, and symptoms is essential in the fight against diseases. Many attempts have been made to introduce the use of machine learning to the scientific process of hypothesis generation(HG), which refers to the discovery of meaningful implicit connections between biomedical terms. However, most existing methods fail to truly capture the temporal dynamics of scientific term relations and also assume unobserved connections to be irrelevant (i.e., in a positive-negative (PN) learning setting). To break these limits, we formulate this HG problem as future connectivity prediction task on a dynamic attributed graph via positive-unlabeled (PU) learning. Then, the key is to capture the temporal evolution of node pair (term pair) relations from just the positive and unlabeled data. We propose a variational inference model to estimate the positive prior, and incorporate it in the learning of node pair embeddings, which are then used for link prediction. Experiment results on real-world biomedical term relationship datasets and case study analyses on a COVID-19 dataset validate the effectiveness of the proposed model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题