论文标题
因果关系的因果关系
Causal Balancing for Domain Generalization
论文作者
论文摘要
尽管机器学习模型迅速推进了各种现实世界任务的最先进,但鉴于这些模型对虚假相关性的脆弱性,跨域(OOD)的概括仍然是一个具有挑战性的问题。我们提出了一种平衡的迷你批次抽样策略,以基于数据生成过程的基本因果机制的不变性,将偏见的数据分布转换为无虚拟平衡分布。我们认为,在这种平衡分布中训练的贝叶斯最佳分类器在各种各样的环境空间中是最佳的最佳选择。在利用足够的火车环境时,我们还提供了建议的数据生成过程的潜在变量模型的可识别性保证。实验是在域上进行的,从经验上证明我们的方法在基准上报告了20个基线的最佳性能。
While machine learning models rapidly advance the state-of-the-art on various real-world tasks, out-of-domain (OOD) generalization remains a challenging problem given the vulnerability of these models to spurious correlations. We propose a balanced mini-batch sampling strategy to transform a biased data distribution into a spurious-free balanced distribution, based on the invariance of the underlying causal mechanisms for the data generation process. We argue that the Bayes optimal classifiers trained on such balanced distribution are minimax optimal across a diverse enough environment space. We also provide an identifiability guarantee of the latent variable model of the proposed data generation process, when utilizing enough train environments. Experiments are conducted on DomainBed, demonstrating empirically that our method obtains the best performance across 20 baselines reported on the benchmark.