调整深度学习以进行代码开关非正式短文的情感分类

论文标题

调整深度学习以进行代码开关非正式短文的情感分类

Adapting Deep Learning for Sentiment Classification of Code-Switched Informal Short Text

论文作者

Shakeel, Muhammad Haroon, Karim, Asim

论文摘要

如今，正在生成大量的短文，它使用了受区域语言影响的非标准写作风格。这种非正式和代码转换的内容的资源不足，即使在诸如情感分类之类的流行任务中，标记的数据集和语言模型也是如此。在这项工作中，我们（1）提出了一个名为Multisenti的标签数据集，用于对代码开关的非正式短文的情感分类，（2）探讨将资源从资源丰富的语言中适应非正式语言的可行性，（3）提出了一个深度学习的模型，以对代码转换非正式短文的情感分类。我们的目标是无需任何词汇归一化，语言翻译或代码转换指示。将所提出的模型的性能与三种现有的多语言情感分类模型进行了比较。结果表明，所提出的模型在一般和基于字符的嵌入过程中的性能更好，同时比训练基于单词的域特异性嵌入更有效地表现出同等的性能。

Nowadays, an abundance of short text is being generated that uses nonstandard writing styles influenced by regional languages. Such informal and code-switched content are under-resourced in terms of labeled datasets and language models even for popular tasks like sentiment classification. In this work, we (1) present a labeled dataset called MultiSenti for sentiment classification of code-switched informal short text, (2) explore the feasibility of adapting resources from a resource-rich language for an informal one, and (3) propose a deep learning-based model for sentiment classification of code-switched informal short text. We aim to achieve this without any lexical normalization, language translation, or code-switching indication. The performance of the proposed models is compared with three existing multilingual sentiment classification models. The results show that the proposed model performs better in general and adapting character-based embeddings yield equivalent performance while being computationally more efficient than training word-based domain-specific embeddings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题