自然语言理解任务中基于变压器模型的压力测试评估

论文标题

自然语言理解任务中基于变压器模型的压力测试评估

Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks

论文作者

Aspillaga, Carlos, Carvallo, Andrés, Araujo, Vladimir

论文摘要

近年来，由于引入了变压器体系结构，近年来在自然语言处理领域取得了重大进展。当前的最新模型通过大量参数和大规模文本语料库进行预训练，在几个下游任务上显示出令人印象深刻的结果。许多研究人员研究了以前的（非转化器）模型，以了解其在不同情况下的实际行为，表明这些模型正在利用数据集的线索或故障，并且对输入数据的轻微扰动可以严重降低其性能。相比之下，最近的模型尚未系统地测试，以表明其在严重的压力条件下的稳健性。因此，这项工作评估了自然语言推理（NLI）的三个基于变压器的模型（Roberta，XLNet和Bert），并评估了问题答案（QA）任务，以了解它们是否更健壮，或者它们是否与前辈相同。结果，我们的实验表明，罗伯塔，XLNET和BERT比复发性的神经网络模型更强大，可以针对NLI和QA任务进行压力测试。然而，它们仍然非常脆弱，表现出各种意外的行为，从而表明该领域的未来仍有进步的余地。

There has been significant progress in recent years in the field of Natural Language Processing thanks to the introduction of the Transformer architecture. Current state-of-the-art models, via a large number of parameters and pre-training on massive text corpus, have shown impressive results on several downstream tasks. Many researchers have studied previous (non-Transformer) models to understand their actual behavior under different scenarios, showing that these models are taking advantage of clues or failures of datasets and that slight perturbations on the input data can severely reduce their performance. In contrast, recent models have not been systematically tested with adversarial-examples in order to show their robustness under severe stress conditions. For that reason, this work evaluates three Transformer-based models (RoBERTa, XLNet, and BERT) in Natural Language Inference (NLI) and Question Answering (QA) tasks to know if they are more robust or if they have the same flaws as their predecessors. As a result, our experiments reveal that RoBERTa, XLNet and BERT are more robust than recurrent neural network models to stress tests for both NLI and QA tasks. Nevertheless, they are still very fragile and demonstrate various unexpected behaviors, thus revealing that there is still room for future improvement in this field.

下载PDF全文

下载文献需遵守相关版权规定

论文标题