论文标题
关于反事实的推断与未观察的混杂
On counterfactual inference with unobserved confounding
论文作者
论文摘要
鉴于一项具有$ n $独立但异质单位的观察性研究,我们的目标是仅使用一个包含协变量,干预措施和结果的一个$ p $维度样本来学习每个单元的反事实分布。具体而言,我们允许未观察到的混杂,从而引入干预措施和结果之间的统计偏差,并加剧各个单位的异质性。将结果的条件分布建模为指数式家族,我们将学习单位级反事实分布的学习减少到学习$ n $指数式的家庭分布,具有异质参数,每个分布只有一个样本。我们引入了一个凸目标,该目标汇集了所有$ n $样本以共同学习所有$ n $参数向量,并提供单位均值平方误差绑定,该误差与参数空间的度量熵线性缩放。例如,当$ k $已知向量的参数为$ s $ -s-sparse线性组合时,错误为$ o(s \ log k/p)$。在途中,我们得出了足够的条件,可以满足对数Sobolev不平等的满足。作为框架的应用,我们的结果可以使稀疏缺失协变量的一致插补。
Given an observational study with $n$ independent but heterogeneous units, our goal is to learn the counterfactual distribution for each unit using only one $p$-dimensional sample per unit containing covariates, interventions, and outcomes. Specifically, we allow for unobserved confounding that introduces statistical biases between interventions and outcomes as well as exacerbates the heterogeneity across units. Modeling the conditional distribution of the outcomes as an exponential family, we reduce learning the unit-level counterfactual distributions to learning $n$ exponential family distributions with heterogeneous parameters and only one sample per distribution. We introduce a convex objective that pools all $n$ samples to jointly learn all $n$ parameter vectors, and provide a unit-wise mean squared error bound that scales linearly with the metric entropy of the parameter space. For example, when the parameters are $s$-sparse linear combination of $k$ known vectors, the error is $O(s\log k/p)$. En route, we derive sufficient conditions for compactly supported distributions to satisfy the logarithmic Sobolev inequality. As an application of the framework, our results enable consistent imputation of sparsely missing covariates.