汇总的多输出高斯流程，具有跨域的知识传输

论文标题

汇总的多输出高斯流程，具有跨域的知识传输

Aggregated Multi-output Gaussian Processes with Knowledge Transfer Across Domains

论文作者

Tanaka, Yusuke, Tanaka, Toshiyuki, Iwata, Tomoharu, Kurashima, Takeshi, Okawa, Maya, Akagi, Yasunori, Toda, Hiroyuki

论文摘要

汇总数据通常出现在社会经济和公共安全等各个领域。汇总数据与点不关联，而与支持（例如，城市中的空间区域）相关联。由于支撑物可能具有各种粒度，具体取决于属性（例如贫困率和犯罪率），因此对此类数据进行建模并不直接。本文提供了一个多输出高斯流程（MOGP）模型，该模型使用各自粒度的多个聚合数据集IND属性功能。在提出的模型中，假定每个属性的函数被认为是建模为独立潜在GPS的线性混合的依赖GP。我们设计一个具有每个属性聚合过程的观察模型；该过程是GP在相应支持上的组成部分。我们还引入了混合权重的先前分布，该分布允许通过共享先验的域（例如城市）进行知识传输。在这种情况下，这是有利的，因为城市中的空间汇总数据集太粗糙而无法插值。所提出的模型仍然可以通过利用其他城市中的聚合数据集来准确地预测属性。提出的模型的推断基于变异贝叶斯，它使人们能够使用来自多个域的聚合数据集学习模型参数。该实验表明，所提出的模型在完善现实世界数据集上的粗粒骨料数据的任务中胜过：北京的空气污染物的时间序列以及来自纽约市和芝加哥的各种空间数据集。

Aggregate data often appear in various fields such as socio-economics and public security. The aggregate data are associated not with points but with supports (e.g., spatial regions in a city). Since the supports may have various granularities depending on attributes (e.g., poverty rate and crime rate), modeling such data is not straightforward. This article offers a multi-output Gaussian process (MoGP) model that infers functions for attributes using multiple aggregate datasets of respective granularities. In the proposed model, the function for each attribute is assumed to be a dependent GP modeled as a linear mixing of independent latent GPs. We design an observation model with an aggregation process for each attribute; the process is an integral of the GP over the corresponding support. We also introduce a prior distribution of the mixing weights, which allows a knowledge transfer across domains (e.g., cities) by sharing the prior. This is advantageous in such a situation where the spatially aggregated dataset in a city is too coarse to interpolate; the proposed model can still make accurate predictions of attributes by utilizing aggregate datasets in other cities. The inference of the proposed model is based on variational Bayes, which enables one to learn the model parameters using the aggregate datasets from multiple domains. The experiments demonstrate that the proposed model outperforms in the task of refining coarse-grained aggregate data on real-world datasets: Time series of air pollutants in Beijing and various kinds of spatial datasets from New York City and Chicago.

下载PDF全文

下载文献需遵守相关版权规定

论文标题