论文标题

lex2sent:一种无监督情绪分析的装袋方法

Lex2Sent: A bagging approach to unsupervised sentiment analysis

论文作者

Lange, Kai-Robin, Rieger, Jonas, Jentsch, Carsten

论文摘要

无监督的文本分类及其最常见的形式是情感分析,曾经通过计算存储在词典中的文本中的单词来执行,该文字将每个单词分配给一个类或中性单词。近年来,这些基于词典的方法不受欢迎,并被计算要求仅使用诸如Dododer的模型(例如GPT-4)的仅编码模型(例如BERT和零摄像分类)的计算要求的微调技术取代。在本文中,我们提出了一种替代方法:LEX2SENT,它比经典词典方法提供了改进,但不需要任何GPU或外部硬件。为了对文本进行分类,我们训练嵌入模型,以确定嵌入文档嵌入与合适词典部分的嵌入之间的距离。我们采用重新采样,从而导致行李效应,从而提高分类的性能。我们表明,我们的模型优于词典,并为二元性情绪分析的任务提供了高绩效的几次微调方法的基础。

Unsupervised text classification, with its most common form being sentiment analysis, used to be performed by counting words in a text that were stored in a lexicon, which assigns each word to one class or as a neutral word. In recent years, these lexicon-based methods fell out of favor and were replaced by computationally demanding fine-tuning techniques for encoder-only models such as BERT and zero-shot classification using decoder-only models such as GPT-4. In this paper, we propose an alternative approach: Lex2Sent, which provides improvement over classic lexicon methods but does not require any GPU or external hardware. To classify texts, we train embedding models to determine the distances between document embeddings and the embeddings of the parts of a suitable lexicon. We employ resampling, which results in a bagging effect, boosting the performance of the classification. We show that our model outperforms lexica and provides a basis for a high performing few-shot fine-tuning approach in the task of binary sentiment analysis.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源