使用关键字权重排名的多域文本分类的课程学习方法

论文标题

使用关键字权重排名的多域文本分类的课程学习方法

A Curriculum Learning Approach for Multi-domain Text Classification Using Keyword weight Ranking

论文作者

Yuan, Zilin, Li, Yinghui, Li, Yangning, Xie, Rui, Wu, Wei, Zheng, Hai-Tao

论文摘要

文本分类是一个非常经典的NLP任务，但它具有两个突出的缺点：一方面，文本分类非常依赖于域。也就是说，在一个域的语料库中训练的分类器在另一个领域的表现不佳。另一方面，文本分类模型需要大量注释的数据进行培训。但是，对于某些域，可能没有足够的注释数据。因此，研究如何有效利用来自不同领域的文本数据以改善各个域中模型的性能是有价值的。一些多域文本分类模型通过对抗训练训练，以在所有域和每个域的特定功能之间提取共享特征。我们指出，特定于域特异性功能的独特性是不同的，因此在本文中，我们建议使用基于关键字权重排名的课程学习策略来提高多域文本分类模型的性能。亚马逊评论和FDU-MTL数据集的实验结果表明，我们的课程学习策略有效地改善了基于对抗性学习的多域文本分类模型的性能，并且优于先进方法。

Text classification is a very classic NLP task, but it has two prominent shortcomings: On the one hand, text classification is deeply domain-dependent. That is, a classifier trained on the corpus of one domain may not perform so well in another domain. On the other hand, text classification models require a lot of annotated data for training. However, for some domains, there may not exist enough annotated data. Therefore, it is valuable to investigate how to efficiently utilize text data from different domains to improve the performance of models in various domains. Some multi-domain text classification models are trained by adversarial training to extract shared features among all domains and the specific features of each domain. We noted that the distinctness of the domain-specific features is different, so in this paper, we propose to use a curriculum learning strategy based on keyword weight ranking to improve the performance of multi-domain text classification models. The experimental results on the Amazon review and FDU-MTL datasets show that our curriculum learning strategy effectively improves the performance of multi-domain text classification models based on adversarial learning and outperforms state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题