论文标题
提取动态信息以改善时间序列建模:具有科学工作流程的案例研究
Extract Dynamic Information To Improve Time Series Modeling: a Case Study with Scientific Workflow
论文作者
论文摘要
在建模时间序列数据中,我们通常需要增加现有数据记录以提高建模准确性。在这项工作中,我们描述了许多技术来提取有关大型科学工作流程的当前状态的动态信息,这些信息可以推广到其他类型的应用程序。要建模的特定任务是将文件从实验设施传输到数据中心所需的时间。我们方法的关键思想是在某些方面找到与当前事件相匹配的过去数据传输事件。测试表明,与仅具有静态特征的类似模型相比,我们可以将最近的事件确定与一些记录的属性相匹配,并将预测误差降低约12%。我们还探索了一种特定应用程序的技术来提取有关数据生产过程的信息,并能够将平均预测错误降低44%。
In modeling time series data, we often need to augment the existing data records to increase the modeling accuracy. In this work, we describe a number of techniques to extract dynamic information about the current state of a large scientific workflow, which could be generalized to other types of applications. The specific task to be modeled is the time needed for transferring a file from an experimental facility to a data center. The key idea of our approach is to find recent past data transfer events that match the current event in some ways. Tests showed that we could identify recent events matching some recorded properties and reduce the prediction error by about 12% compared to the similar models with only static features. We additionally explored an application specific technique to extract information about the data production process, and was able to reduce the average prediction error by 44%.