论文标题
调整半监督分类的平均场模型
On tuning a mean-field model for semi-supervised classification
论文作者
论文摘要
半监督学习(SSL)已成为一个有趣的研究领域,因为它在可用的标签和未标记数据的方案中学习能力。在这项工作中,我们专注于转导任务 - 当目标是标记所有呈现给学习者的数据时 - 与POTTS模型的平均场近似值。针对此特定任务,我们研究分类结果如何取决于$β$,并发现最佳阶段在很大程度上取决于可用的标记数据量。在同一项研究中,我们还观察到,$β$中有关小波动的更稳定的分类与高概率的配置有关,并提出基于这种观察结果的调整方法。此方法依赖于一种新的参数$γ$,然后我们评估了与现场的经典方法相比,上述数量的两个不同值。通过更改可用数据的可用数据和相似性图中最近的邻居数量来进行此评估。经验结果表明,调整方法是有效的,并且允许NMF胜过更少类的数据集中的其他方法。此外,$γ$的选定值之一还导致结果,结果对邻居数量的变化更具弹性,这可能是SSL领域的从业者感兴趣的。
Semi-supervised learning (SSL) has become an interesting research area due to its capacity for learning in scenarios where both labeled and unlabeled data are available. In this work, we focus on the task of transduction - when the objective is to label all data presented to the learner - with a mean-field approximation to the Potts model. Aiming at this particular task we study how classification results depend on $β$ and find that the optimal phase depends highly on the amount of labeled data available. In the same study, we also observe that more stable classifications regarding small fluctuations in $β$ are related to configurations of high probability and propose a tuning approach based on such observation. This method relies on a novel parameter $γ$ and we then evaluate two different values of the said quantity in comparison with classical methods in the field. This evaluation is conducted by changing the amount of labeled data available and the number of nearest neighbors in the similarity graph. Empirical results show that the tuning method is effective and allows NMF to outperform other approaches in datasets with fewer classes. In addition, one of the chosen values for $γ$ also leads to results that are more resilient to changes in the number of neighbors, which might be of interest to practitioners in the field of SSL.