论文标题
在(几个)用户阅读之前,无法预测新闻文章的评论卷
Cannot Predict Comment Volume of a News Article before (a few) Users Read It
论文作者
论文摘要
许多新闻媒体允许用户就每日世界活动的主题发表评论。新闻文章是春季用户贡献内容的种子,即评论。一篇文章可能会吸引冷漠的用户参与度(几十条评论)或自发的热情用户参与度(数千条评论)。在本文中,我们研究了预测新闻文章将收到的用户评论总数的问题。我们的主要见解是,用户评论的早期动态对准确的预测做出了最大的贡献,而新闻文章的特定因素的影响很小。这似乎是一种有趣且研究的现象:新闻媒体中的集体社会行为会塑造用户响应,甚至可能淡化文章的内容。我们编译和分析文献中的旧和新颖特征。这些功能涵盖了各种各样的方面,包括新闻文章和评论内容,时间动态,情感/语言特征和用户行为。我们表明,提前评论率是最终评论数量的最佳指标。我们对此功能进行了深入的分析,例如新闻媒体和新闻文章类别。我们表明,在新闻媒体和新闻文章类别(例如,政治,体育或健康)之间,早期比率与最终评论数量以及预测准确性之间的关系差异很大。
Many news outlets allow users to contribute comments on topics about daily world events. News articles are the seeds that spring users' interest to contribute content, i.e., comments. An article may attract an apathetic user engagement (several tens of comments) or a spontaneous fervent user engagement (thousands of comments). In this paper, we study the problem of predicting the total number of user comments a news article will receive. Our main insight is that the early dynamics of user comments contribute the most to an accurate prediction, while news article specific factors have surprisingly little influence. This appears to be an interesting and understudied phenomenon: collective social behavior at a news outlet shapes user response and may even downplay the content of an article. We compile and analyze a large number of features, both old and novel from literature. The features span a broad spectrum of facets including news article and comment contents, temporal dynamics, sentiment/linguistic features, and user behaviors. We show that the early arrival rate of comments is the best indicator of the eventual number of comments. We conduct an in-depth analysis of this feature across several dimensions, such as news outlets and news article categories. We show that the relationship between the early rate and the final number of comments as well as the prediction accuracy vary considerably across news outlets and news article categories (e.g., politics, sports, or health).