论文标题

滥用语言培训数据数据:垃圾进,垃圾

Directions in Abusive Language Training Data: Garbage In, Garbage Out

论文作者

Vidgen, Bertie, Derczynski, Leon

论文摘要

数据驱动的分析和滥用在线内容的检测涵盖了许多不同的任务,现象,上下文和方法论。本文有系统地回顾滥用语言数据集创建和内容,并与开放网站结合使用,以分类滥用语言数据。这些知识的收集导致一项综合,为使用这个复杂且高度多样化的数据的从业者提供了基于证据的建议。

Data-driven analysis and detection of abusive online content covers many different tasks, phenomena, contexts, and methodologies. This paper systematically reviews abusive language dataset creation and content in conjunction with an open website for cataloguing abusive language data. This collection of knowledge leads to a synthesis providing evidence-based recommendations for practitioners working with this complex and highly diverse data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源