第三种类型的概率：统计关系学习和相对频率的推理

论文标题

第三种类型的概率：统计关系学习和相对频率的推理

Probabilities of the Third Type: Statistical Relational Learning and Reasoning with Relative Frequencies

论文作者

Weitkämper, Felix

论文摘要

当对关系数据建模概率依赖性时，对域中状态的相对频率的依赖性很常见。例如，流行病期间学校关闭的可能性可能取决于被感染的学生超过阈值的比例。通常，依赖关系通常不取决于离散的阈值，而是连续的：例如，任何一个蚊子咬合传播疾病的可能性取决于载体蚊子的比例。当前的方法通常仅考虑在可能的世界上而不是域元素本身的概率。最近引入的有条件概率逻辑的凸起的贝叶斯网络是一个例外，该网络对概率数据表达了离散的依赖性。我们介绍了功能性提升的贝叶斯网络，一种形式主义，将相对频率的连续依赖性纳入统计关系人工智能中，并将其与提升的贝叶斯网络进行比较和对比，以进行有条件的概率逻辑。合并相对频率不仅有益于建模；它还为学习问题提供了一种更严格的方法，其中培训和测试或应用域具有不同的大小。为此，我们提供了由功能抬高尺寸范围的贝叶斯网络引起的渐近概率分布的表示。由于该表示形式在域范围内具有良好的缩放性能，因此可以用来从随机采样的子群中始终如一地估算大型域的参数。此外，我们表明，在FLBN的参数族中，收敛在参数中是统一的，这确保了渐近概率对模型参数的有意义依赖性。

Dependencies on the relative frequency of a state in the domain are common when modelling probabilistic dependencies on relational data. For instance, the likelihood of a school closure during an epidemic might depend on the proportion of infected pupils exceeding a threshold. Often, rather than depending on discrete thresholds, dependencies are continuous: for instance, the likelihood of any one mosquito bite transmitting an illness depends on the proportion of carrier mosquitoes. Current approaches usually only consider probabilities over possible worlds rather than over domain elements themselves. An exception are the recently introduced lifted Bayesian networks for conditional probability logic, which express discrete dependencies on probabilistic data. We introduce functional lifted Bayesian networks, a formalism that explicitly incorporates continuous dependencies on relative frequencies into statistical relational artificial intelligence, and compare and contrast them with lifted Bayesian networks for conditional probability logic. Incorporating relative frequencies is not only beneficial to modelling; it also provides a more rigorous approach to learning problems where training and test or application domains have different sizes. To this end, we provide a representation of the asymptotic probability distributions induced by functional lifted Bayesian networks on domains of increasing sizes. Since that representation has well-understood scaling behaviour across domain sizes, it can be used to estimate parameters for a large domain consistently from randomly sampled subpopulations. Furthermore, we show that in parametric families of FLBN, convergence is uniform in the parameters, which ensures a meaningful dependence of the asymptotic probabilities on the parameters of the model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题