论文标题
Mentsum:探索心理健康在线帖子的摘要的资源
MentSum: A Resource for Exploring Summarization of Mental Health Online Posts
论文作者
论文摘要
心理健康仍然是全球公共卫生的重大挑战。随着在线平台的日益普及,许多人使用平台分享其心理健康状况,表达自己的感受并向社区和辅导员寻求帮助。其中一些平台(例如ReachOut)是用户注册以寻求帮助的专用论坛。诸如Reddit之类的其他人会提供用户公开但匿名发布其心理健康困扰的子雷数。尽管帖子的长度有所不同,但提供简短但内容丰富的摘要是有益的。为了促进对心理健康在线帖子的摘要进行研究,我们介绍了精神健康摘要数据集,Mentsum,其中包含Reddit的24,000多个精心挑选的用户帖子,以及其简短的用户写入的摘要(称为TLDR),用43个心理健康款项中的43个心理健康款项。该特定领域的数据集不仅可以引起对Reddit的简短摘要,而且可能引起人们的关注,而且还可以在诸如ReachOut之类的专用心理健康论坛上生成帖子的摘要。我们进一步评估了胭脂分数的提取性和抽象性最先进的摘要基线,并最终对用户写入和系统生成的摘要进行了深入的人类评估研究,突出了这项研究中的挑战。
Mental health remains a significant challenge of public health worldwide. With increasing popularity of online platforms, many use the platforms to share their mental health conditions, express their feelings, and seek help from the community and counselors. Some of these platforms, such as Reachout, are dedicated forums where the users register to seek help. Others such as Reddit provide subreddits where the users publicly but anonymously post their mental health distress. Although posts are of varying length, it is beneficial to provide a short, but informative summary for fast processing by the counselors. To facilitate research in summarization of mental health online posts, we introduce Mental Health Summarization dataset, MentSum, containing over 24k carefully selected user posts from Reddit, along with their short user-written summary (called TLDR) in English from 43 mental health subreddits. This domain-specific dataset could be of interest not only for generating short summaries on Reddit, but also for generating summaries of posts on the dedicated mental health forums such as Reachout. We further evaluate both extractive and abstractive state-of-the-art summarization baselines in terms of Rouge scores, and finally conduct an in-depth human evaluation study of both user-written and system-generated summaries, highlighting challenges in this research.