解构分布：学习的重点框架

论文标题

解构分布：学习的重点框架

Deconstructing Distributions: A Pointwise Framework of Learning

论文作者

Kaplun, Gal, Ghosh, Nikhil, Garg, Saurabh, Barak, Boaz, Nakkiran, Preetum

论文摘要

在机器学习中，我们传统上评估单个模型的性能，平均在测试输入集合中进行平均。在这项工作中，我们提出了一种新方法：在$ \ textit {单个输入点} $上评估时，我们测量了模型集合的性能。具体来说，我们研究了一个点的$ \ textIt {profile {profile} $：模型在测试分布上的平均性能与在该点上的点上表现之间的关系。我们发现，配置文件可以在分布式和分布之外对模型和数据的结构产生新的见解。例如，我们从经验上表明，实际数据分布由具有质量不同的点组成。一方面，有“兼容”点，在角度和平均表现之间具有很强的相关性。另一方面，有些点具有弱甚至$ \ textit {nogent} $相关性：提高整体模型准确性实际上$ \ textit {hurts} $性能的情况。我们证明，这些实验性观察与先前工作中提出的几种简化学习模型的预测不一致。作为一个应用程序，我们使用配置文件来构建一个数据集，我们称为CIFAR-10-NENG：CINIC-10的子集，因此对于标准模型，CIFAR-10-NEN的准确性是$ \ textIt {negalise {negalissipational {negalissipational {negalissipational {negalise cifar-10测试中的准确性。这首先说明了一个完全反转“准确性”的OOD数据集（Miller，Taori，Raghunathan，Sagawa，Koh，Shankar，Shankar，Liang，Carmon和Schmidt 2021）

In machine learning, we traditionally evaluate the performance of a single model, averaged over a collection of test inputs. In this work, we propose a new approach: we measure the performance of a collection of models when evaluated on a $\textit{single input point}$. Specifically, we study a point's $\textit{profile}$: the relationship between models' average performance on the test distribution and their pointwise performance on this individual point. We find that profiles can yield new insights into the structure of both models and data -- in and out-of-distribution. For example, we empirically show that real data distributions consist of points with qualitatively different profiles. On one hand, there are "compatible" points with strong correlation between the pointwise and average performance. On the other hand, there are points with weak and even $\textit{negative}$ correlation: cases where improving overall model accuracy actually $\textit{hurts}$ performance on these inputs. We prove that these experimental observations are inconsistent with the predictions of several simplified models of learning proposed in prior work. As an application, we use profiles to construct a dataset we call CIFAR-10-NEG: a subset of CINIC-10 such that for standard models, accuracy on CIFAR-10-NEG is $\textit{negatively correlated}$ with accuracy on CIFAR-10 test. This illustrates, for the first time, an OOD dataset that completely inverts "accuracy-on-the-line" (Miller, Taori, Raghunathan, Sagawa, Koh, Shankar, Liang, Carmon, and Schmidt 2021)

下载PDF全文

下载文献需遵守相关版权规定

论文标题