论文标题

LOPS:学习顺序灵感启发的伪标签选择,用于弱监督文本分类

LOPS: Learning Order Inspired Pseudo-Label Selection for Weakly Supervised Text Classification

论文作者

Mekala, Dheeraj, Dong, Chengyu, Shang, Jingbo

论文摘要

弱监督的文本分类方法通常会训练基于伪标记的深神经分类器。伪标签的质量对于最终性能至关重要,但是由于其启发式性质,它们不可避免地嘈杂,因此选择正确的伪造性具有巨大的绩效潜力。一种直接的解决方案是根据与其伪标签相对应的神经分类器中的SoftMax概率分数选择样品。但是,我们通过实验表明,由于校准较差的模型的高度自信预测,此类溶液是无效且不稳定的。关于深神经模型的记忆效应的最新研究表明,这些模型首先用干净的标签记忆训练样本,然后是具有嘈杂标签的模型。受到这一观察的启发,我们提出了一种新型的伪标签选择方法,该方法倾向于考虑样本的学习顺序。我们假设学习顺序反映了在排名方面的错误注释的可能性,因此建议选择早期学到的样本。可以将LOPS视为对大多数现有弱监督文本分类方法的强大性能提升插件,正如在四个真实世界数据集上的广泛实验中所证实的那样。

Weakly supervised text classification methods typically train a deep neural classifier based on pseudo-labels. The quality of pseudo-labels is crucial to final performance but they are inevitably noisy due to their heuristic nature, so selecting the correct ones has a huge potential for performance boost. One straightforward solution is to select samples based on the softmax probability scores in the neural classifier corresponding to their pseudo-labels. However, we show through our experiments that such solutions are ineffective and unstable due to the erroneously high-confidence predictions from poorly calibrated models. Recent studies on the memorization effects of deep neural models suggest that these models first memorize training samples with clean labels and then those with noisy labels. Inspired by this observation, we propose a novel pseudo-label selection method LOPS that takes learning order of samples into consideration. We hypothesize that the learning order reflects the probability of wrong annotation in terms of ranking, and therefore, propose to select the samples that are learnt earlier. LOPS can be viewed as a strong performance-boost plug-in to most of existing weakly-supervised text classification methods, as confirmed in extensive experiments on four real-world datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源