“称我的性别歧视，但是……”：使用心理量表和对抗样本重新审查性别歧视检测

论文标题

“称我的性别歧视，但是……”：使用心理量表和对抗样本重新审查性别歧视检测

"Call me sexist, but...": Revisiting Sexism Detection Using Psychological Scales and Adversarial Samples

论文作者

Samory, Mattia, Sen, Indira, Kohne, Julian, Floeck, Fabian, Wagner, Claudia

论文摘要

研究集中在自动化方法上，以有效地在线检测性别歧视。尽管明显的性别歧视似乎很容易发现，但其微妙的形式和多种表情并非如此。在本文中，我们通过将性别歧视的不同方面概述在心理范围内实施中。从量表中，我们在社交媒体中得出了一本针对性别歧视的代码手册，我们用来注释现有和新颖的数据集，从而在性别歧视的构造方面浮出水面的限制和有效性。接下来，我们利用注释的数据集生成对抗性示例，并测试性别歧视检测方法的可靠性。结果表明，当前的机器学习模型在非常狭窄的性别歧视的语言标记上拾取，并且不能很好地推广到室外示例。但是，在训练时间内包括各种数据和对抗性示例会导致模型更好地概括，并且对数据收集的工件更加可靠。通过提供基于规模的代码手册和有关最先进的缺点的见解，我们希望为开发更好，更广泛的性别歧视检测模型做出贡献，包括对理论驱动的数据收集方法的思考。

Research has focused on automated methods to effectively detect sexism online. Although overt sexism seems easy to spot, its subtle forms and manifold expressions are not. In this paper, we outline the different dimensions of sexism by grounding them in their implementation in psychological scales. From the scales, we derive a codebook for sexism in social media, which we use to annotate existing and novel datasets, surfacing their limitations in breadth and validity with respect to the construct of sexism. Next, we leverage the annotated datasets to generate adversarial examples, and test the reliability of sexism detection methods. Results indicate that current machine learning models pick up on a very narrow set of linguistic markers of sexism and do not generalize well to out-of-domain examples. Yet, including diverse data and adversarial examples at training time results in models that generalize better and that are more robust to artifacts of data collection. By providing a scale-based codebook and insights regarding the shortcomings of the state-of-the-art, we hope to contribute to the development of better and broader models for sexism detection, including reflections on theory-driven approaches to data collection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题