基于描述的文本分类和增强学习

论文标题

基于描述的文本分类和增强学习

Description Based Text Classification with Reinforcement Learning

论文作者

Chai, Duo, Wu, Wei, Han, Qinghong, Wu, Fei, Li, Jiwei

论文摘要

文本分类的任务通常分为两个阶段：{\ it文本特征提取}和{\ it分类}。在此标准形式中，形式类别仅表示为标签词汇中的索引，并且该模型缺乏对要分类的明确说明。受到当前将NLP问题作为问题回答任务的趋势的启发，我们为文本分类提出了一个新的框架，其中每个类别标签与类别描述相关联。描述是由手工制作的模板或使用增强学习中的抽象/提取模型生成的。描述和文本的串联被馈送到分类器，以决定是否应将当前标签分配给文本。拟议的策略迫使该模型就标签上介绍最显着的文本，这可以被视为艰难的注意力，从而导致表现更好。我们在广泛的文本分类任务上观察到强大的基线的绩效提高，包括单标签分类，多标签分类和多光值分析。

The task of text classification is usually divided into two stages: {\it text feature extraction} and {\it classification}. In this standard formalization categories are merely represented as indexes in the label vocabulary, and the model lacks for explicit instructions on what to classify. Inspired by the current trend of formalizing NLP problems as question answering tasks, we propose a new framework for text classification, in which each category label is associated with a category description. Descriptions are generated by hand-crafted templates or using abstractive/extractive models from reinforcement learning. The concatenation of the description and the text is fed to the classifier to decide whether or not the current label should be assigned to the text. The proposed strategy forces the model to attend to the most salient texts with respect to the label, which can be regarded as a hard version of attention, leading to better performances. We observe significant performance boosts over strong baselines on a wide range of text classification tasks including single-label classification, multi-label classification and multi-aspect sentiment analysis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题