Stereokg：用于文化知识和刻板印象的数据驱动知识图构建

论文标题

Stereokg：用于文化知识和刻板印象的数据驱动知识图构建

StereoKG: Data-Driven Knowledge Graph Construction for Cultural Knowledge and Stereotypes

论文作者

Deshpande, Awantee, Ruiter, Dana, Mosbach, Marius, Klakow, Dietrich

论文摘要

分析种族或宗教偏见对于提高自然语言处理模型的公平，问责制和透明度很重要。但是，许多技术都依赖于人类编译的偏见术语清单，这些术语的创建价格昂贵，并且覆盖范围限制。在这项研究中，我们提出了一条完全数据驱动的管道，用于生成文化知识和刻板印象的知识图（kg）。我们最终的公斤涵盖了5个宗教团体和5个民族，很容易扩展到包括更多实体。我们的人类评估表明，大多数（59.2％）的非辛格尔顿条目是连贯和完整的刻板印象。我们进一步表明，对口头上的KG进行中间掩盖的语言模型培训会导致该模型中更高水平的文化意识，并有可能在相关任务（即仇恨语音检测）上提高对知识 - 重要样本的分类性能。

Analyzing ethnic or religious bias is important for improving fairness, accountability, and transparency of natural language processing models. However, many techniques rely on human-compiled lists of bias terms, which are expensive to create and are limited in coverage. In this study, we present a fully data-driven pipeline for generating a knowledge graph (KG) of cultural knowledge and stereotypes. Our resulting KG covers 5 religious groups and 5 nationalities and can easily be extended to include more entities. Our human evaluation shows that the majority (59.2%) of non-singleton entries are coherent and complete stereotypes. We further show that performing intermediate masked language model training on the verbalized KG leads to a higher level of cultural awareness in the model and has the potential to increase classification performance on knowledge-crucial samples on a related task, i.e., hate speech detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题