重新访问基于变压器的长文档分类的模型

论文标题

重新访问基于变压器的长文档分类的模型

Revisiting Transformer-based Models for Long Document Classification

论文作者

Dai, Xiang, Chalkidis, Ilias, Darkner, Sune, Elliott, Desmond

论文摘要

文本分类中的最新文献偏向短文本序列（例如句子或段落）。在实际应用程序中，多页的多段文档很常见，并且无法通过基于Vanilla Transformer的模型来有效地编码它们。我们比较了不同的基于变压器的长文档分类（TRLDC）方法，该方法旨在减轻香草变压器的计算开销以编码更长的文本，即稀疏的注意力和层次结构编码方法。我们检查了稀疏注意的几个方面（例如，本地注意力窗口的大小，全局注意力的使用）和层次的层次结构（例如，文档拆分策略）变压器在四个涵盖不同域的文档分类数据集上。我们可以从能够处理更长的文本中观察到明显的好处，并且根据我们的结果，我们得出了将基于变压器的模型应用于长文档分类任务的实用建议。

The recent literature in text classification is biased towards short text sequences (e.g., sentences or paragraphs). In real-world applications, multi-page multi-paragraph documents are common and they cannot be efficiently encoded by vanilla Transformer-based models. We compare different Transformer-based Long Document Classification (TrLDC) approaches that aim to mitigate the computational overhead of vanilla transformers to encode much longer text, namely sparse attention and hierarchical encoding methods. We examine several aspects of sparse attention (e.g., size of local attention window, use of global attention) and hierarchical (e.g., document splitting strategy) transformers on four document classification datasets covering different domains. We observe a clear benefit from being able to process longer text, and, based on our results, we derive practical advice of applying Transformer-based models on long document classification tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题