论文标题
S2OSC:一种全面的半监督方法,用于开放式套装分类
S2OSC: A Holistic Semi-Supervised Approach for Open Set Classification
论文作者
论文摘要
开放集分类(OSC)解决了确定数据在推理过程中是否在课堂上还是课外的问题,仅在培训时提供了一组课堂示例。传统的OSC方法通常使用课堂数据训练歧视或生成模型,然后利用预训练的模型直接对测试数据进行分类。但是,这些方法始终遭受嵌入混乱问题的困扰,即部分课外实例与类似语义的课堂外面的实例混合在一起,因此很难进行分类。为了解决这个问题,我们将半监督的学习统一开发出一种新型的OSC算法S2OSC,该算法以转导的方式结合了类外的实例过滤和模型重新训练。详细说明,鉴于新近即将到来的测试数据库,S2SOSS首先使用预训练的模型过滤不同的课外实例,并为其注释超级级别。然后,S2OSC通过梳理半监督范式中的课堂内和脱离标记的数据并将其剩余未标记的测试数据训练整体分类模型,该模型还集成了预先培训的知识蒸馏模型以进一步分开分离混合实例。尽管它很简单,但实验结果表明,S2OSC在各种OSC任务中实现了最先进的性能,其中包括仅有300个伪标签的CIFAR-10上的F1的85.4%。我们还证明了如何通过流数据有效地将S2OSC扩展为增量OSC设置。
Open set classification (OSC) tackles the problem of determining whether the data are in-class or out-of-class during inference, when only provided with a set of in-class examples at training time. Traditional OSC methods usually train discriminative or generative models with in-class data, then utilize the pre-trained models to classify test data directly. However, these methods always suffer from embedding confusion problem, i.e., partial out-of-class instances are mixed with in-class ones of similar semantics, making it difficult to classify. To solve this problem, we unify semi-supervised learning to develop a novel OSC algorithm, S2OSC, that incorporates out-of-class instances filtering and model re-training in a transductive manner. In detail, given a pool of newly coming test data, S2OSC firstly filters distinct out-of-class instances using the pre-trained model, and annotates super-class for them. Then, S2OSC trains a holistic classification model by combing in-class and out-of-class labeled data and remaining unlabeled test data in semi-supervised paradigm, which also integrates pre-trained model for knowledge distillation to further separate mixed instances. Despite its simplicity, the experimental results show that S2OSC achieves state-of-the-art performance across a variety of OSC tasks, including 85.4% of F1 on CIFAR-10 with only 300 pseudo-labels. We also demonstrate how S2OSC can be expanded to incremental OSC setting effectively with streaming data.