论文标题
任何时间因果发现的一个迭代步骤
A Single Iterative Step for Anytime Causal Discovery
论文作者
论文摘要
我们提出了一种声音和完整的算法,用于在可能存在潜在混杂因素和选择偏见的情况下从观察到的非相关数据中恢复因果图。我们依靠因果马尔可夫和忠诚的假设,并通过在观察到的变量之间进行一系列条件独立性(CI)测试来恢复基本因果图的等效类别。我们提出了一个迭代应用的单一步骤,使得在任何迭代之后,从所得图中需要的独立性和因果关系是正确的,并且在连续的迭代中变得更加信息。从本质上讲,我们将设置的CI条件的大小将其距离与结果图上的测试节点的距离联系起来。每次迭代都通过执行具有比前迭代更大的条件集的CI测试来完善骨骼和方向。在迭代中,CI测试的条件集由指定搜索距离内的节点构建,这些条件集的大小等于此搜索距离。然后,该算法与条件集大小一起迭代增加搜索距离。因此,每次迭代都会完善图形,该图是通过具有较小条件集的先前迭代来恢复的 - 具有较高的统计能力。我们证明,与FCI算法相比,我们的算法需要更少的CI测试和较小的状况集。这既可以使用完美的CI Oracle恢复真实的基础图,又可以使用有限的观察到的数据来准确估算图形。
We present a sound and complete algorithm for recovering causal graphs from observed, non-interventional data, in the possible presence of latent confounders and selection bias. We rely on the causal Markov and faithfulness assumptions and recover the equivalence class of the underlying causal graph by performing a series of conditional independence (CI) tests between observed variables. We propose a single step that is applied iteratively, such that the independence and causal relations entailed from the resulting graph, after any iteration, is correct and becomes more informative with successive iteration. Essentially, we tie the size of the CI condition set to its distance from the tested nodes on the resulting graph. Each iteration refines the skeleton and orientation by performing CI tests having condition sets that are larger than in the preceding iteration. In an iteration, condition sets of CI tests are constructed from nodes that are within a specified search distance, and the sizes of these condition sets is equal to this search distance. The algorithm then iteratively increases the search distance along with the condition set sizes. Thus, each iteration refines a graph, that was recovered by previous iterations having smaller condition sets -- having a higher statistical power. We demonstrate that our algorithm requires significantly fewer CI tests and smaller condition sets compared to the FCI algorithm. This is evident for both recovering the true underlying graph using a perfect CI oracle, and accurately estimating the graph using limited observed data.