论文标题
贝叶斯互补的内核学习多维时空数据
Bayesian Complementary Kernelized Learning for Multidimensional Spatiotemporal Data
论文作者
论文摘要
多维时空数据的概率建模对于许多现实世界应用至关重要。由于实际时空数据经常表现出非平稳性且不可分离的复杂依赖性,因此开发了有效且计算上有效的统计模型,以适应非平稳/不可分割的过程,这些过程既包含远距离变化又有较短的任务,尤其是具有各种腐败/缺失结构的大型数据集。在本文中,我们提出了一个新的统计框架 - 贝叶斯互补内核学习(BCKL) - 以实现多维时空数据的可扩展概率建模。为了有效地表征复杂的依赖性,BCKL整合了两种互补方法 - 内核低量张量分解和短距离时空高斯过程。具体而言,我们使用多线性低级分量分量组件来捕获数据中的全局/远程相关性,并基于紧凑的核心函数引入加法短尺度GP,以表征其余的局部变异性。我们为模型推断开发了有效的马尔可夫链蒙特卡洛(MCMC)算法,并在合成和现实世界时空数据集上评估了所提出的BCKL框架。我们的实验结果表明,BCKL在提供准确的后均值和高质量不确定性估计方面提供了卓越的性能,从而证实了全球和局部组件在建模时空数据中的重要性。
Probabilistic modeling of multidimensional spatiotemporal data is critical to many real-world applications. As real-world spatiotemporal data often exhibits complex dependencies that are nonstationary and nonseparable, developing effective and computationally efficient statistical models to accommodate nonstationary/nonseparable processes containing both long-range and short-scale variations becomes a challenging task, in particular for large-scale datasets with various corruption/missing structures. In this paper, we propose a new statistical framework -- Bayesian Complementary Kernelized Learning (BCKL) -- to achieve scalable probabilistic modeling for multidimensional spatiotemporal data. To effectively characterize complex dependencies, BCKL integrates two complementary approaches -- kernelized low-rank tensor factorization and short-range spatiotemporal Gaussian Processes. Specifically, we use a multi-linear low-rank factorization component to capture the global/long-range correlations in the data and introduce an additive short-scale GP based on compactly supported kernel functions to characterize the remaining local variabilities. We develop an efficient Markov chain Monte Carlo (MCMC) algorithm for model inference and evaluate the proposed BCKL framework on both synthetic and real-world spatiotemporal datasets. Our experiment results show that BCKL offers superior performance in providing accurate posterior mean and high-quality uncertainty estimates, confirming the importance of both global and local components in modeling spatiotemporal data.