论文标题
可靠,高效的长期社交媒体监控
Reliable and Efficient Long-Term Social Media Monitoring
论文作者
论文摘要
社交媒体数据现在已被许多学术研究人员广泛使用。但是,长期的社交媒体数据收集项目(通常涉及从公共使用API中收集数据)通常会在依靠本地地区网络服务器(LAN)(LANS)在长时间内收集大量流媒体社交媒体数据时遇到问题。在这份技术报告中,我们提出了基于云的数据收集,预处理和归档基础架构,并认为该系统会减轻或解决在以最小云计算成本上运行LANS上的社交媒体数据收集项目时最常遇到的问题。我们展示了这种方法如何在不同的云计算体系结构中起作用,以及如何调整该方法以从其他社交媒体平台收集流数据。
Social media data is now widely used by many academic researchers. However, long-term social media data collection projects, which most typically involve collecting data from public-use APIs, often encounter issues when relying on local-area network servers (LANs) to collect high-volume streaming social media data over long periods of time. In this technical report, we present a cloud-based data collection, pre-processing, and archiving infrastructure, and argue that this system mitigates or resolves the problems most typically encountered when running social media data collection projects on LANs at minimal cloud-computing costs. We show how this approach works in different cloud computing architectures, and how to adapt the method to collect streaming data from other social media platforms.