论文标题
保证金是您需要的吗?对表格数据的积极学习的广泛实证研究
Is margin all you need? An extensive empirical study of active learning on tabular data
论文作者
论文摘要
给定标记的训练集和一系列未标记的数据,主动学习的目标(AL)是确定标签的最佳未标记点。在这项综合研究中,我们分析了在OpenML-CC18基准中对69个现实世界表格分类数据集进行培训的深神经网络上的各种AL算法的性能。我们考虑不同的数据制度以及自我监督模型预训练的效果。令人惊讶的是,我们发现经典的边距采样技术匹配或胜过所有其他的,包括当前的最新实验,在广泛的实验环境中。对于研究人员而言,我们希望鼓励对利润的严格基准测试,并向面临表格数据标记限制的从业人员通常所需的无参数保证金。
Given a labeled training set and a collection of unlabeled data, the goal of active learning (AL) is to identify the best unlabeled points to label. In this comprehensive study, we analyze the performance of a variety of AL algorithms on deep neural networks trained on 69 real-world tabular classification datasets from the OpenML-CC18 benchmark. We consider different data regimes and the effect of self-supervised model pre-training. Surprisingly, we find that the classical margin sampling technique matches or outperforms all others, including current state-of-art, in a wide range of experimental settings. To researchers, we hope to encourage rigorous benchmarking against margin, and to practitioners facing tabular data labeling constraints that hyper-parameter-free margin may often be all they need.