缺少价值插补的高斯流程

论文标题

缺少价值插补的高斯流程

Gaussian Processes for Missing Value Imputation

论文作者

Jafrasteh, Bahram, Hernández-Lobato, Daniel, Lubián-López, Simón Pedro, Benavente-Fernández, Isabel

论文摘要

在许多现实生活数据集中，缺失值很常见。但是，当前的大多数机器学习方法无法处理缺失的值。这意味着应事先提出它们。高斯过程（GPS）是具有准确的不确定性估计的非参数模型，结合稀疏近似值和随机变化推断量表与大数据集相结合。稀疏的GP可用于计算缺失数据的预测分布。在这里，我们提出了稀疏GPS的分层组成，用于使用其他维度的所有变量在每个维度上预测缺失值。我们称之为缺少GP（MGP）的方法。可以同时训练MGP，以算取所有观察到的缺失值。具体而言，它输出了每个缺失值的预测分布，然后在其他缺失值的插图中使用。我们在一个私人临床数据集和四个UCI数据集中评估MGP，其中缺失值的百分比不同。我们将MGP的性能与其他最先进的方法进行了比较，以推出缺失值，包括基于稀疏GP和深GPS的变体。获得的结果表明，MGP的性能明显更好。

Missing values are common in many real-life datasets. However, most of the current machine learning methods can not handle missing values. This means that they should be imputed beforehand. Gaussian Processes (GPs) are non-parametric models with accurate uncertainty estimates that combined with sparse approximations and stochastic variational inference scale to large data sets. Sparse GPs can be used to compute a predictive distribution for missing data. Here, we present a hierarchical composition of sparse GPs that is used to predict missing values at each dimension using all the variables from the other dimensions. We call the approach missing GP (MGP). MGP can be trained simultaneously to impute all observed missing values. Specifically, it outputs a predictive distribution for each missing value that is then used in the imputation of other missing values. We evaluate MGP in one private clinical data set and four UCI datasets with a different percentage of missing values. We compare the performance of MGP with other state-of-the-art methods for imputing missing values, including variants based on sparse GPs and deep GPs. The results obtained show a significantly better performance of MGP.

下载PDF全文

下载文献需遵守相关版权规定

论文标题