立场检测基准：您的立场检测有多稳健？

论文标题

立场检测基准：您的立场检测有多稳健？

Stance Detection Benchmark: How Robust Is Your Stance Detection?

论文作者

Schiller, Benjamin, Daxenberger, Johannes, Gurevych, Iryna

论文摘要

立场检测（STD）旨在检测作者对特定主题或主张的立场，并已成为诸如假新闻检测，索赔验证和论证搜索之类的应用程序中的关键组成部分。但是，尽管人类很容易检测到立场，但机器学习模型显然没有完成这项任务。鉴于数据集大小的主要差异和STD的框架（例如，类和输入的数量），我们引入了一个STD基准测试标准，该基准从多数据域学习（MDL）设置中的各个域的十个STD数据集以及通过转移学习中的相关任务中学习。在此基准设置中，我们能够在五个数据集上介绍新的最新结果。然而，这些模型仍然表现远低于人类能力，甚至简单的对抗性攻击都严重损害了MDL模型的性能。对这一现象的更深入研究表明，从设计中继承了从多个数据集继承的偏差。我们的分析强调需要专注于多任务学习方法中的鲁棒性和偏见策略。提供基准数据集和代码。

Stance Detection (StD) aims to detect an author's stance towards a certain topic or claim and has become a key component in applications like fake news detection, claim validation, and argument search. However, while stance is easily detected by humans, machine learning models are clearly falling short of this task. Given the major differences in dataset sizes and framing of StD (e.g. number of classes and inputs), we introduce a StD benchmark that learns from ten StD datasets of various domains in a multi-dataset learning (MDL) setting, as well as from related tasks via transfer learning. Within this benchmark setup, we are able to present new state-of-the-art results on five of the datasets. Yet, the models still perform well below human capabilities and even simple adversarial attacks severely hurt the performance of MDL models. Deeper investigation into this phenomenon suggests the existence of biases inherited from multiple datasets by design. Our analysis emphasizes the need of focus on robustness and de-biasing strategies in multi-task learning approaches. The benchmark dataset and code is made available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题