论文标题
不变风险最小化的风险
The Risks of Invariant Risk Minimization
论文作者
论文摘要
不变因果预测(Peters等人,2016年)是一种分布外泛化的技术,它假设数据分布的某些方面在整个训练集中有所不同,但是基本的因果机制仍然恒定。最近,Arjovsky等人。 (2019年)提出了不变风险最小化(IRM),这是一个基于此想法的目标,用于学习数据的深层,不变特征,这是潜在变量的复杂函数;随后提出了许多替代方案。但是,所有这些作品的正式保证都严重缺乏。在本文中,我们在IRM目标以及这些最近提出的替代方案下进行了对分类的首次分析,这是一个相当自然和一般的模型。在线性情况下,我们显示了最佳解决方案成功的简单条件,或者更常见地无法恢复最佳不变预测变量。此外,我们还提出了非线性制度的第一个结果:我们证明,除非测试数据与训练分布完全相似,否则IRM可能会在灾难性上失败 - 这正是它打算解决的问题。因此,在这种情况下,我们发现IRM及其替代方案从根本上不会改善标准经验风险最小化。
Invariant Causal Prediction (Peters et al., 2016) is a technique for out-of-distribution generalization which assumes that some aspects of the data distribution vary across the training set but that the underlying causal mechanisms remain constant. Recently, Arjovsky et al. (2019) proposed Invariant Risk Minimization (IRM), an objective based on this idea for learning deep, invariant features of data which are a complex function of latent variables; many alternatives have subsequently been suggested. However, formal guarantees for all of these works are severely lacking. In this paper, we present the first analysis of classification under the IRM objective--as well as these recently proposed alternatives--under a fairly natural and general model. In the linear case, we show simple conditions under which the optimal solution succeeds or, more often, fails to recover the optimal invariant predictor. We furthermore present the very first results in the non-linear regime: we demonstrate that IRM can fail catastrophically unless the test data are sufficiently similar to the training distribution--this is precisely the issue that it was intended to solve. Thus, in this setting we find that IRM and its alternatives fundamentally do not improve over standard Empirical Risk Minimization.