从积极和未标记的数据中学习，并有任意积极的转变

论文标题

从积极和未标记的数据中学习，并有任意积极的转变

Learning from Positive and Unlabeled Data with Arbitrary Positive Shift

论文作者

Hammoudeh, Zayd, Lowd, Daniel

论文摘要

正面标记（PU）学习仅使用正面和未标记的数据训练二元分类器。一个常见的简化假设是，阳性数据代表了目标正类别。由于时间漂移，域移动和/或对抗性操纵，这种假设在实践中很少存在。本文表明，即使使用来自源和目标分布的未标记数据的任意非代表性的阳性数据，PU学习也是可能的。我们的关键见解是，只有负类的分布需要固定。我们将其整合到两种统计上一致的方法来解决任意偏见的方法中 - 一种方法将负标记的学习与未标记的未标记的学习结合在一起，而另一个使用小说，递归风险估计器。实验结果证明了我们在众多现实世界数据集和积极偏见的形式中的方法的有效性，包括不相关的积极阶级条件支持。此外，我们提出了一种通用，简化的方法，以解决PU风险估计过度适应。

Positive-unlabeled (PU) learning trains a binary classifier using only positive and unlabeled data. A common simplifying assumption is that the positive data is representative of the target positive class. This assumption rarely holds in practice due to temporal drift, domain shift, and/or adversarial manipulation. This paper shows that PU learning is possible even with arbitrarily non-representative positive data given unlabeled data from the source and target distributions. Our key insight is that only the negative class's distribution need be fixed. We integrate this into two statistically consistent methods to address arbitrary positive bias - one approach combines negative-unlabeled learning with unlabeled-unlabeled learning while the other uses a novel, recursive risk estimator. Experimental results demonstrate our methods' effectiveness across numerous real-world datasets and forms of positive bias, including disjoint positive class-conditional supports. Additionally, we propose a general, simplified approach to address PU risk estimation overfitting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题