论文标题
积极学习用于汇总输出的回归
Active Learning for Regression with Aggregated Outputs
论文作者
论文摘要
由于隐私保护或数据收集的难度,我们无法观察到每个实例的单个输出,但是我们可以观察到某些现实世界应用程序中的一个集合中的多个实例中求和的汇总输出。为了降低此类汇总数据的训练回归模型的标签成本,我们提出了一种主动学习方法,该方法顺序选择要标记的集合以通过更少的标记集合来提高预测性能。对于选择测量,提出的方法使用相互信息,该信息通过观察聚合的输出来量化模型参数的不确定性。对于给定输入的贝叶斯线性基础函数,包括近似的高斯过程和神经网络,我们可以以封闭形式有效地计算互信息。通过使用各种数据集的实验,我们证明所提出的方法比现有方法具有更少的标签集以更好的预测性能。
Due to the privacy protection or the difficulty of data collection, we cannot observe individual outputs for each instance, but we can observe aggregated outputs that are summed over multiple instances in a set in some real-world applications. To reduce the labeling cost for training regression models for such aggregated data, we propose an active learning method that sequentially selects sets to be labeled to improve the predictive performance with fewer labeled sets. For the selection measurement, the proposed method uses the mutual information, which quantifies the reduction of the uncertainty of the model parameters by observing the aggregated output. With Bayesian linear basis functions for modeling outputs given an input, which include approximated Gaussian processes and neural networks, we can efficiently calculate the mutual information in a closed form. With the experiments using various datasets, we demonstrate that the proposed method achieves better predictive performance with fewer labeled sets than existing methods.