论文标题
测试学习算法的分布假设
Testing distributional assumptions of learning algorithms
论文作者
论文摘要
当可以对示例分布进行假设,例如高斯性或域上的统一性,有许多高维函数类具有快速的不可知学习算法。但是,如何确信数据确实能够满足这种假设,从而可以信任不可知论学习算法的产出质量?我们提出了一个模型,该模型可以系统地研究测试仪 - 核生纳对$(\ Mathcal {a},\ Mathcal {t})$的设计,这样,如果数据中的示例分布会通过测试人员$ \ nathcal {t} $,则可以安全地信任agnosticnosticner $ \ nata的输出。 为了演示模型的功能,我们将其应用于标准高斯分布下不可知的半个空间的经典问题,并提供了一个测试仪 - 学习者对,并具有$ n^{\ tilde {o} {o}(1/ε^4)} $的合并运行时间。这在定性上与此任务最著名的普通不可知论学习算法相匹配。相比之下,$ L_1 $和EMD距离度量的有限样本高斯测试仪不存在。一个关键步骤是表明半空间与低度矩相对于低度矩接近高斯的分布而被良好的多项式相关。 我们还超越了球形对称分布,并为$ \ {0,1 \}^n $在均匀分布下的半个空间提供了一个测试仪,并为$ n^{\ tilde {o} {o} {o}(1/ε^4)} $组合运行时提供了一个均匀分布。这是使用多项式近似理论和临界指数机制实现的。 我们还显示存在一些经过良好研究的设置,其中$ 2^{\ tilde {o}(\ sqrt {n})}} $可用的不可或缺的不可知性学习算法,但是Tester-Learner对组合的运行时间必须高达$ 2^{ω(n)} $。因此,测试人员学习对的设计本身就是一个研究方向,独立于标准的不可知论学习。
There are many high dimensional function classes that have fast agnostic learning algorithms when assumptions on the distribution of examples can be made, such as Gaussianity or uniformity over the domain. But how can one be confident that data indeed satisfies such assumption, so that one can trust in output quality of the agnostic learning algorithm? We propose a model by which to systematically study the design of tester-learner pairs $(\mathcal{A},\mathcal{T})$, such that if the distribution on examples in the data passes the tester $\mathcal{T}$ then one can safely trust the output of the agnostic learner $\mathcal{A}$ on the data. To demonstrate the power of the model, we apply it to the classical problem of agnostically learning halfspaces under the standard Gaussian distribution and present a tester-learner pair with combined run-time of $n^{\tilde{O}(1/ε^4)}$. This qualitatively matches that of the best known ordinary agnostic learning algorithms for this task. In contrast, finite sample Gaussianity testers do not exist for the $L_1$ and EMD distance measures. A key step is to show that half-spaces are well-approximated with low-degree polynomials relative to distributions with low-degree moments close to those of a Gaussian. We also go beyond spherically-symmetric distributions, and give a tester-learner pair for halfspaces under the uniform distribution on $\{0,1\}^n$ with combined run-time of $n^{\tilde{O}(1/ε^4)}$. This is achieved using polynomial approximation theory and critical index machinery. We also show there exist some well-studied settings where $2^{\tilde{O}(\sqrt{n})}$ run-time agnostic learning algorithms are available, yet the combined run-times of tester-learner pairs must be as high as $2^{Ω(n)}$. On that account, the design of tester-learner pairs is a research direction in its own right independent of standard agnostic learning.