论文标题

具有连续分布的生成数据

Generative Datalog with Continuous Distributions

论文作者

Grohe, Martin, Kaminski, Benjamin Lucien, Katoen, Joost-Pieter, Lindner, Peter

论文摘要

Bárány等人主张需要结合陈述性和概率的编程。 (TODS 2017)最近引入了Datalog的概率扩展,作为“纯粹的声明性概率编程语言”。我们重新审视这种语言,并提出了一种更有原则的方法来定义基于随机内核和马尔可夫过程的语义 - 概率理论的标准概念。这使我们能够将语义扩展到连续的概率分布,从而解决了Bárány等人提出的一个开放问题。 我们表明,我们的语义相当强大,在评估程序时允许并行执行和任意追逐命令。我们将语义施加在无限概率数据库(Grohe and Lindner,ICDT 2020)的框架中,并证明即使在概率数据Alog的输入是一个任意的概率数据库时,语义也仍然有意义。

Arguing for the need to combine declarative and probabilistic programming, Bárány et al. (TODS 2017) recently introduced a probabilistic extension of Datalog as a "purely declarative probabilistic programming language." We revisit this language and propose a more principled approach towards defining its semantics based on stochastic kernels and Markov processes - standard notions from probability theory. This allows us to extend the semantics to continuous probability distributions, thereby settling an open problem posed by Bárány et al. We show that our semantics is fairly robust, allowing both parallel execution and arbitrary chase orders when evaluating a program. We cast our semantics in the framework of infinite probabilistic databases (Grohe and Lindner, ICDT 2020), and show that the semantics remains meaningful even when the input of a probabilistic Datalog program is an arbitrary probabilistic database.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源