共享模式的图案用于预测和缺失值

论文标题

共享模式的图案用于预测和缺失值

Sharing pattern submodels for prediction with missing values

论文作者

Stempfle, Lena, Panahi, Ashkan, Johansson, Fredrik D.

论文摘要

在机器学习的许多应用中，不可避免的值是不可避免的，并且在训练和测试时都面临挑战。当反复出现的模式中缺少变量时，已经提出了拟合单独的图案子模型作为解决方案。但是，独立拟合模型并不能有效利用所有可用数据。相反，将单个共享模型拟合到完整数据集中取决于插补，当丢失取决于未观察到的因素时，通常会导致偏见结果。我们提出了一种替代方法，称为共享模式子模型，i）i）预测在测试时对缺失值具有可靠性，ii）维持或提高模式子模型的预测能力，iii）具有简短的描述，从而提高了可解释性。参数共享通过稀疏诱导正则化来实现，我们证明这会导致一致的估计。最后，即使丢失和目标结果都取决于未观察到的变量，我们也给出了何时共享模型的条件。对合成和现实世界数据集的分类和回归实验表明，我们的模型在模式专业化和信息共享之间实现了良好的权衡。

Missing values are unavoidable in many applications of machine learning and present challenges both during training and at test time. When variables are missing in recurring patterns, fitting separate pattern submodels have been proposed as a solution. However, fitting models independently does not make efficient use of all available data. Conversely, fitting a single shared model to the full data set relies on imputation which often leads to biased results when missingness depends on unobserved factors. We propose an alternative approach, called sharing pattern submodels, which i) makes predictions that are robust to missing values at test time, ii) maintains or improves the predictive power of pattern submodels, and iii) has a short description, enabling improved interpretability. Parameter sharing is enforced through sparsity-inducing regularization which we prove leads to consistent estimation. Finally, we give conditions for when a sharing model is optimal, even when both missingness and the target outcome depend on unobserved variables. Classification and regression experiments on synthetic and real-world data sets demonstrate that our models achieve a favorable tradeoff between pattern specialization and information sharing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题