您是您的发推文：通过过去的推文对用户进行分析以改善仇恨语音检测

论文标题

您是您的发推文：通过过去的推文对用户进行分析以改善仇恨语音检测

You Are What You Tweet: Profiling Users by Past Tweets to Improve Hate Speech Detection

论文作者

Chaudhry, Prateek, Lease, Matthew

论文摘要

仇恨言语检测研究主要集中在纯粹基于内容的方法上，而无需利用任何其他环境。我们简要批评此任务制定的优缺点。然后，我们通过过去的话语来调查分析用户在更好地预测新话语是否构成仇恨言论之前的信息中的言论。为了评估这一点，我们将三个带有其他时间表数据的Twitter讨厌语音数据集增强，然后将此额外的上下文嵌入强大的基线模型中。令人鼓舞的结果表明，进一步研究的优点，尽管注释方案和过程的差异以及Twitter API限制和数据共享策略使分析变得复杂。

Hate speech detection research has predominantly focused on purely content-based methods, without exploiting any additional context. We briefly critique pros and cons of this task formulation. We then investigate profiling users by their past utterances as an informative prior to better predict whether new utterances constitute hate speech. To evaluate this, we augment three Twitter hate speech datasets with additional timeline data, then embed this additional context into a strong baseline model. Promising results suggest merit for further investigation, though analysis is complicated by differences in annotation schemes and processes, as well as Twitter API limitations and data sharing policies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题