论文标题
估计具有Wikipedia数据的流感样症状的普通方法
A general method for estimating the prevalence of Influenza-Like-Symptoms with Wikipedia data
论文作者
论文摘要
流感是一种急性呼吸道季节性疾病,会影响全球数百万人,仅在欧洲就会导致数千人死亡。能够以快速可靠的方式估算疾病对特定国家的影响对于计划和组织有效的对策至关重要,现在可以通过利用非常规的数据源(例如网络搜索和访问)来进行有效的对策。在这项研究中,我们显示了有关Wikipedia对选定的文章和机器学习模型的页面视图的可行性,以获得四个欧洲国家类似流感的疾病发病率的准确估计:意大利,德国,比利时和荷兰。我们提出了一种基于个性化的Pagerank和Cyclerank的两种算法的新型语言无关方法,以自动选择不需要专家监督的最相关的Wikipedia页面。然后,我们展示了我们的模型如何通过将其与以前的解决方案进行比较来达到最新结果。
Influenza is an acute respiratory seasonal disease that affects millions of people worldwide and causes thousands of deaths in Europe alone. Being able to estimate in a fast and reliable way the impact of an illness on a given country is essential to plan and organize effective countermeasures, which is now possible by leveraging unconventional data sources like web searches and visits. In this study, we show the feasibility of exploiting information about Wikipedia's page views of a selected group of articles and machine learning models to obtain accurate estimates of influenza-like illnesses incidence in four European countries: Italy, Germany, Belgium, and the Netherlands. We propose a novel language-agnostic method, based on two algorithms, Personalized PageRank and CycleRank, to automatically select the most relevant Wikipedia pages to be monitored without the need for expert supervision. We then show how our model is able to reach state-of-the-art results by comparing it with previous solutions.