通过双向生成对抗网络进行时序的插补和预测

论文标题

通过双向生成对抗网络进行时序的插补和预测

Time-series Imputation and Prediction with Bi-Directional Generative Adversarial Networks

论文作者

Gupta, Mehak, Beheshti, Rahmatollah

论文摘要

多元时间序列数据用于许多分类和回归预测任务，并且经常性模型已被广泛用于此类任务。最常见的复发模型假设时间序列数据元素的长度相等，并且定期记录有序的观测值。但是，现实世界中的时间序列数据既没有相似的长度，也没有相同数量的观测值。他们还缺少条目，这阻碍了预测任务的性能。在本文中，我们通过介绍一个模型来解决这些问题，该模型是针对不规则观察到的不规则观察和变化的时间序列数据的合并任务，而丢失的条目。我们提出的模型（BI-GAN）在生成对抗环境中使用双向复发网络。发电机是一个双向复发网络，它接收实际的不完整数据并施加缺失值。鉴别器试图区分发电机输出中的实际值和估算值。我们的模型学习了如何在输入时间步骤（预测）之间或之外估算缺失元素（预测），从而作为时间序列数据的有效预测工具。我们的方法在现场的最新方法中具有三个优点：（a）单个模型可以用于插补和预测任务；（b）它可以执行丢失数据的时间序列的预测任务；（c）它不需要知道训练期间的观察和预测时间窗口，这为长期和短期预测提供了灵活的预测窗口。我们在两个公共数据集和另一个大型现实世界健康记录数据集上评估了我们的模型，以估算和预测儿童中体重指数（BMI）值，并在两种情况下都显示出其出色的表现。

Multivariate time-series data are used in many classification and regression predictive tasks, and recurrent models have been widely used for such tasks. Most common recurrent models assume that time-series data elements are of equal length and the ordered observations are recorded at regular intervals. However, real-world time-series data have neither a similar length nor a same number of observations. They also have missing entries, which hinders the performance of predictive tasks. In this paper, we approach these issues by presenting a model for the combined task of imputing and predicting values for the irregularly observed and varying length time-series data with missing entries. Our proposed model (Bi-GAN) uses a bidirectional recurrent network in a generative adversarial setting. The generator is a bidirectional recurrent network that receives actual incomplete data and imputes the missing values. The discriminator attempts to discriminate between the actual and the imputed values in the output of the generator. Our model learns how to impute missing elements in-between (imputation) or outside of the input time steps (prediction), hence working as an effective any-time prediction tool for time-series data. Our method has three advantages to the state-of-the-art methods in the field: (a) single model can be used for both imputation and prediction tasks; (b) it can perform prediction task for time-series of varying length with missing data; (c) it does not require to know the observation and prediction time window during training which provides a flexible length of prediction window for both long-term and short-term predictions. We evaluate our model on two public datasets and on another large real-world electronic health records dataset to impute and predict body mass index (BMI) values in children and show its superior performance in both settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题