从积极数据偏向置信度的二进制分类

论文标题

从积极数据偏向置信度的二进制分类

Binary Classification from Positive Data with Skewed Confidence

论文作者

Shinoda, Kazuhiko, Kaji, Hirotaka, Sugiyama, Masashi

论文摘要

正信（PCONF）分类[Ishida et al。，2018]是一种有希望的弱监督学习方法，仅从配备有信心的正数据中训练二元分类器。但是，实际上，在注释过程中产生的偏见可能会歪曲信心。 PCONF分类器不能以偏斜的信心正确学习，因此，分类性能可能会恶化。在本文中，我们介绍了偏斜信心的参数化模型，并提出了选择超参数的方法，该方法在假设我们将正样本的错误分类率视为先验知识的假设下取消了偏斜置信度的负面影响。我们通过与简单的线性模型和神经网络模型的基准问题进行合成实验来证明所提出的方法的有效性。我们还将我们的方法应用于驾驶员的嗜睡预测，以表明它可以与现实世界中的问题效果很好，在现实世界中，根据手动注释获得了信心。

Positive-confidence (Pconf) classification [Ishida et al., 2018] is a promising weakly-supervised learning method which trains a binary classifier only from positive data equipped with confidence. However, in practice, the confidence may be skewed by bias arising in an annotation process. The Pconf classifier cannot be properly learned with skewed confidence, and consequently, the classification performance might be deteriorated. In this paper, we introduce the parameterized model of the skewed confidence, and propose the method for selecting the hyperparameter which cancels out the negative impact of skewed confidence under the assumption that we have the misclassification rate of positive samples as a prior knowledge. We demonstrate the effectiveness of the proposed method through a synthetic experiment with simple linear models and benchmark problems with neural network models. We also apply our method to drivers' drowsiness prediction to show that it works well with a real-world problem where confidence is obtained based on manual annotation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题