集成时间序列汇总和预测算法及其在COVID-19数据挖掘中的应用

论文标题

集成时间序列汇总和预测算法及其在COVID-19数据挖掘中的应用

Integrated Time Series Summarization and Prediction Algorithm and its Application to COVID-19 Data Mining

论文作者

Plessen, Mogens Graf

论文摘要

本文提出了一种简单的方法，可以根据所有时间序列的统计信息从一组多个相关时间序列A的压缩表示形式提取。这是通过一种层次算法来实现的，该算法首先基于基于群集数据的质心的分割生成一个字母，然后将这些形状的标签分配给每个时间序列的每个时间序列的分割，这是通过使用不约束的动态warping来分割每个时间序列的距离，以与不合格的距离衡量，以处理不合格的时间序列lengs lengts lengts lengts lengts lengts lengts lengts lengts lengts lengts lengts lengts lengts lengts lengts lengts lenghts。因此，为每个时间序列分配了一系列标签。最后一个标签序列的完成允许对单个时间序列的预测。在两个全球Covid-19数据集上进行评估，首先是针对每日净案例的数量（每日新感染减去每日回收率），其次，截至2020年4月27日，归因于Covid-19的每日死亡人数。截至2020年4月27日。截至2020年4月27日，第一个数据集涉及不同国家的249个时间序列，涉及96个时间段的每个数据。主张锁定的分散退出策略。

This paper proposes a simple method to extract from a set of multiple related time series a compressed representation for each time series based on statistics for the entire set of all time series. This is achieved by a hierarchical algorithm that first generates an alphabet of shapelets based on the segmentation of centroids for clustered data, before labels of these shapelets are assigned to the segmentation of each single time series via nearest neighbor search using unconstrained dynamic time warping as distance measure to deal with non-uniform time series lenghts. Thereby, a sequence of labels is assigned for each time series. Completion of the last label sequence permits prediction of individual time series. Proposed method is evaluated on two global COVID-19 datasets, first, for the number of daily net cases (daily new infections minus daily recoveries), and, second, for the number of daily deaths attributed to COVID-19 as of April 27, 2020. The first dataset involves 249 time series for different countries, each of length 96. The second dataset involves 264 time series, each of length 96. Based on detected anomalies in available data a decentralized exit strategy from lockdowns is advocated.

下载PDF全文

下载文献需遵守相关版权规定

论文标题