通过自我训练的零击文本分类

论文标题

通过自我训练的零击文本分类

Zero-Shot Text Classification with Self-Training

论文作者

Gera, Ariel, Halfon, Alon, Shnarch, Eyal, Perlitz, Yotam, Ein-Dor, Liat, Slonim, Noam

论文摘要

大型语言模型的最新进展增加了人们对零声文本分类的关注。特别是，由于其有希望的结果和现成的可用性，因此对自然语言推理数据集进行了固定的模型已被广泛用作零摄像机分类器。但是，此类模型不熟悉目标任务的事实会导致不稳定和绩效问题。我们提出了一种使用简单的自我训练方法来弥合此差距的方法，仅需要类名称以及未标记的数据集，而无需域名专业知识或反复试验和错误。我们表明，对其最自信的预测进行微调零分类器会导致广泛的文本分类任务的大量性能提高，这大概是因为自我训练使零照片模型适应了手头的任务。

Recent advances in large pretrained language models have increased attention to zero-shot text classification. In particular, models finetuned on natural language inference datasets have been widely adopted as zero-shot classifiers due to their promising results and off-the-shelf availability. However, the fact that such models are unfamiliar with the target task can lead to instability and performance issues. We propose a plug-and-play method to bridge this gap using a simple self-training approach, requiring only the class names along with an unlabeled dataset, and without the need for domain expertise or trial and error. We show that fine-tuning the zero-shot classifier on its most confident predictions leads to significant performance gains across a wide range of text classification tasks, presumably since self-training adapts the zero-shot model to the task at hand.

下载PDF全文

下载文献需遵守相关版权规定

论文标题