论文标题

从积极和未标记的数据中学习增强类别

Learning from Positive and Unlabeled Data with Augmented Classes

论文作者

Li, Zhongnian, Yang, Liutao, Ma, Zhongchen, Sun, Tongfeng, Xu, Xinzheng, Zhang, Daoqiang

论文摘要

积极的未标记(PU)学习旨在仅从积极和未标记的数据中学习二进制分类器,这在许多现实世界中都被使用。但是,现有的PU学习算法无法在开放且不断变化的情况下应对现实世界中的挑战,在这种情况下,未观察到的增强类的示例可能会在测试阶段出现。在本文中,我们通过利用来自增强类分布的未标记的数据来提出一个通过增强类(PUAC)进行PU学习的无偏风险估计器,在许多现实世界中,可以轻松收集这些数据。此外,我们得出了针对所提出的估计器绑定的估计误差,该估计量为其融合到最佳解决方案提供了理论保证。多个现实数据集的实验证明了拟议方法的有效性。

Positive Unlabeled (PU) learning aims to learn a binary classifier from only positive and unlabeled data, which is utilized in many real-world scenarios. However, existing PU learning algorithms cannot deal with the real-world challenge in an open and changing scenario, where examples from unobserved augmented classes may emerge in the testing phase. In this paper, we propose an unbiased risk estimator for PU learning with Augmented Classes (PUAC) by utilizing unlabeled data from the augmented classes distribution, which can be easily collected in many real-world scenarios. Besides, we derive the estimation error bound for the proposed estimator, which provides a theoretical guarantee for its convergence to the optimal solution. Experiments on multiple realistic datasets demonstrate the effectiveness of proposed approach.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源