论文标题

表碎片:从计算新闻研究中的工件研究中,多桌数据的可行框架

Table Scraps: An Actionable Framework for Multi-Table Data Wrangling From An Artifact Study of Computational Journalism

论文作者

Kasica, Stephen, Berret, Charles, Munzner, Tamara

论文摘要

对于许多使用数据和计算报告新闻的记者来说,数据争吵是他们工作的组成部分。尽管在企业数据分析的背景下有很多关于数据争吵的文献,但对特定的操作,过程和痛苦的记者在执行这项繁琐的,既耗时的时间既耗时的任务时都鲜为人知。为了更好地了解该用户组的需求,我们对26个新闻机构的33位专业记者撰写的50个公共数据和分析代码公共存储库进行了技术观察研究。我们为计算新闻业,行动和流程中的数据争吵的两个详细和横切分类法开发出来。我们观察到多个表的广泛使用,这是以前的争吵分析中的显着差距。我们为一般的多桌数据争吵开发了一个简洁,可行的框架,其中包括在我们的分类法中记录的争夺操作,这些操作在其他工作中没有明显的相似之处。该框架是第一个合并桌子一流对象的框架,它将支持未来的交互式争吵工具,用于计算新闻和通用使用。我们通过讨论与我们的分类法的关系来评估框架的生成和描述能力。

For the many journalists who use data and computation to report the news, data wrangling is an integral part of their work.Despite an abundance of literature on data wrangling in the context of enterprise data analysis, little is known about the specific operations, processes, and pain points journalists encounter while performing this tedious, time-consuming task. To better understand the needs of this user group, we conduct a technical observation study of 50 public repositories of data and analysis code authored by 33 professional journalists at 26 news organizations. We develop two detailed and cross-cutting taxonomies of data wrangling in computational journalism, for actions and for processes. We observe the extensive use of multiple tables, a notable gap in previous wrangling analyses. We develop a concise, actionable framework for general multi-table data wrangling that includes wrangling operations documented in our taxonomy that are without clear parallels in other work. This framework, the first to incorporate tablesas first-class objects, will support future interactive wrangling tools for both computational journalism and general-purpose use. We assess the generative and descriptive power of our framework through discussion of its relationship to our set of taxonomies.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源