混杂毯子下的因果发现

论文标题

混杂毯子下的因果发现

Causal discovery under a confounder blanket

论文作者

Watson, David S., Silva, Ricardo

论文摘要

从观察数据中推断因果关系很少很简单，但是在高维度中，问题尤其困难。对于这些应用，因果发现算法通常需要参数限制或极端稀疏限制。我们放宽了这些假设，并专注于一个重要但更专业的问题，即在已知的（可能很大的）混杂的协变量（即$ \ textit {Confounder Blanset} $中恢复已知变量的因果秩序。这在许多设置中很有用，例如，在研究具有背景信息的遗传数据的动态生物分子子系统时。在一个称为$ \ textit {混杂毛毯原理} $的结构假设下，我们认为这对于在高维度中的可拖动因果发现至关重要，我们的方法可容纳低稀疏或高稀疏性的图形，同时保持多项式时间的复杂性。我们提出了一种结构学习算法，相对于所谓的$ \ textit {Lazy Oracle} $，该算法是合理且完整的。我们使用线性和非线性系统的有限样本误差控制推理过程，并在一系列模拟和现实世界数据集上演示我们的方法。随附的$ \ texttt {r} $软件包，$ \ texttt {cbl} $可从$ \ texttt {cran} $获得。

Inferring causal relationships from observational data is rarely straightforward, but the problem is especially difficult in high dimensions. For these applications, causal discovery algorithms typically require parametric restrictions or extreme sparsity constraints. We relax these assumptions and focus on an important but more specialized problem, namely recovering the causal order among a subgraph of variables known to descend from some (possibly large) set of confounding covariates, i.e. a $\textit{confounder blanket}$. This is useful in many settings, for example when studying a dynamic biomolecular subsystem with genetic data providing background information. Under a structural assumption called the $\textit{confounder blanket principle}$, which we argue is essential for tractable causal discovery in high dimensions, our method accommodates graphs of low or high sparsity while maintaining polynomial time complexity. We present a structure learning algorithm that is provably sound and complete with respect to a so-called $\textit{lazy oracle}$. We design inference procedures with finite sample error control for linear and nonlinear systems, and demonstrate our approach on a range of simulated and real-world datasets. An accompanying $\texttt{R}$ package, $\texttt{cbl}$, is available from $\texttt{CRAN}$.

下载PDF全文

下载文献需遵守相关版权规定

论文标题