论文标题
Emoment:来自两个南亚国家的情感注释的心理健康语料库
EmoMent: An Emotion Annotated Mental Health Corpus from two South Asian Countries
论文作者
论文摘要
人们经常利用在线媒体(例如Facebook,reddit)作为表达心理困扰并寻求支持的平台。最先进的NLP技术表现出强大的潜力,可以自动从文本中检测到心理健康问题。研究表明,心理健康问题反映在一个人选择语言中所表明的情绪(例如悲伤)中。因此,我们开发了一种新颖的情感注释的心理健康语料库(Emoment),由2802个Facebook帖子(14845个句子)组成,这些帖子(14845个句子)从两个南亚国家(斯里兰卡和印度)中提取。三名临床心理学研究生参与了将这些职位注释分为八类,包括“精神疾病”(例如抑郁症)和情绪(例如,“悲伤”,“愤怒”)。 Emoment语料库达到了98.3%的“非常好”的通道互通协议(即有两个或以上的同意),而Fleiss的Kappa为0.82。我们的基于罗伯塔的模型的F1得分为0.76,第一个任务的宏观平均F1得分为0.77(即,从职位预测精神健康状况)和第二个任务(即相关帖子与我们分类法中定义的类别的关联程度)。
People often utilise online media (e.g., Facebook, Reddit) as a platform to express their psychological distress and seek support. State-of-the-art NLP techniques demonstrate strong potential to automatically detect mental health issues from text. Research suggests that mental health issues are reflected in emotions (e.g., sadness) indicated in a person's choice of language. Therefore, we developed a novel emotion-annotated mental health corpus (EmoMent), consisting of 2802 Facebook posts (14845 sentences) extracted from two South Asian countries - Sri Lanka and India. Three clinical psychology postgraduates were involved in annotating these posts into eight categories, including 'mental illness' (e.g., depression) and emotions (e.g., 'sadness', 'anger'). EmoMent corpus achieved 'very good' inter-annotator agreement of 98.3% (i.e. % with two or more agreement) and Fleiss' Kappa of 0.82. Our RoBERTa based models achieved an F1 score of 0.76 and a macro-averaged F1 score of 0.77 for the first task (i.e. predicting a mental health condition from a post) and the second task (i.e. extent of association of relevant posts with the categories defined in our taxonomy), respectively.