论文标题
并行抽样,以进行有效的高维贝叶斯网络结构学习
Parallel Sampling for Efficient High-dimensional Bayesian Network Structure Learning
论文作者
论文摘要
学习贝叶斯网络结构的基于分数的算法可用于精确和近似解决方案。尽管变量的数量可以更好地缩放,但在存在高维数据的情况下,它在计算上可能很昂贵。本文介绍了一种近似算法,该算法在候选父集(CPSS)上进行并行采样,并且可以看作是MinObs的扩展,这是一种从高维数据中进行结构学习的最新算法。我们称之为平行采样的MinObs(PS-Minobs)的修改后的算法通过为每个变量对CPS进行采样来构建图。在假设下,按照每个变量的贝叶斯分数排序时,采样是在假设下进行的。来自半正常分布的采样可确保采样的CPSS可能是产生较高分数的CPS。经验结果表明,在大多数情况下,当两种算法仅限于同一运行时限制时,所提出的算法比MINOBS发现更高的分数结构。
Score-based algorithms that learn the structure of Bayesian networks can be used for both exact and approximate solutions. While approximate learning scales better with the number of variables, it can be computationally expensive in the presence of high dimensional data. This paper describes an approximate algorithm that performs parallel sampling on Candidate Parent Sets (CPSs), and can be viewed as an extension of MINOBS which is a state-of-the-art algorithm for structure learning from high dimensional data. The modified algorithm, which we call Parallel Sampling MINOBS (PS-MINOBS), constructs the graph by sampling CPSs for each variable. Sampling is performed in parallel under the assumption the distribution of CPSs is half-normal when ordered by Bayesian score for each variable. Sampling from a half-normal distribution ensures that the CPSs sampled are likely to be those which produce the higher scores. Empirical results show that, in most cases, the proposed algorithm discovers higher score structures than MINOBS when both algorithms are restricted to the same runtime limit.