论文标题

策划社交媒体数据

Curating Social Media Data

论文作者

Vaghani, Kushal

论文摘要

社交媒体平台赋予了现代人民脉搏的民主化。由于其广泛的流行和较高的用法,因此在社交媒体网站(例如Twitter,Facebook和Tumblr)上发布的数据是丰富的信息海洋。因此,对社会烙印的数据驱动分析已成为组织和政府进一步改善其产品和服务的重要资产。但是,由于社交媒体数据的动态性和嘈杂性,对原始数据进行准确的分析是一项艰巨的任务。一个关键要求是在将原始数据列入分析管道之前策划原始数据。这种策划过程将原始数据转换为上下文化的数据和知识。我们提出了一条数据策划管道,即CrowdCorrect,以使分析师能够清理和策划社交数据并为可靠的分析做准备。我们的管道使用现有内部工具从社交媒体数据的语料库中提供了自动功能提取。此外,我们使用自动化和众包方法提供双重校正机制。该管道的实现还包括一组工具,用于自动创建微型任务,以促进人群用户在策划原始数据方面的贡献。出于这项研究的目的,由于其受欢迎程度,我们将Twitter用作激励性社交媒体数据平台。

Social media platforms have empowered the democratization of the pulse of people in the modern era. Due to its immense popularity and high usage, data published on social media sites (e.g., Twitter, Facebook and Tumblr) is a rich ocean of information. Therefore data-driven analytics of social imprints has become a vital asset for organisations and governments to further improve their products and services. However, due to the dynamic and noisy nature of social media data, performing accurate analysis on raw data is a challenging task. A key requirement is to curate the raw data before fed into analytics pipelines. This curation process transforms the raw data into contextualized data and knowledge. We propose a data curation pipeline, namely CrowdCorrect, to enable analysts cleansing and curating social data and preparing it for reliable analytics. Our pipeline provides an automatic feature extraction from a corpus of social media data using existing in-house tools. Further, we offer a dual-correction mechanism using both automated and crowd-sourced approaches. The implementation of this pipeline also includes a set of tools for automatically creating micro-tasks to facilitate the contribution of crowd users in curating the raw data. For the purposes of this research, we use Twitter as our motivational social media data platform due to its popularity.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源