论文标题
非欧盟统一的近似
Non-Euclidean Universal Approximation
论文作者
论文摘要
通常需要对神经网络的输入和输出层进行修改,以适应大多数实用学习任务的特殊性。但是,这种变化对建筑近似功能的影响在很大程度上尚不清楚。我们提出了描述特征和读数图的一般条件,这些特征和读数图可以保留体系结构在紧凑型上均匀近似任何连续功能的能力。作为一个应用程序,我们表明,如果体系结构能够进行通用近似,那么修改其最终层以产生二进制值会创建一种能够确定性近似任何分类器的新体系结构。特别是,我们获得了深入CNN和深层馈送网络的保证。我们的结果在几何深度学习范围内也会产生后果。具体而言,当输入和输出空间是cartan-hadamard歧管时,我们获得了满足我们标准的几何特征和读取图。因此,在对称正定矩阵的空间之间使用的常用非欧几里得回归模型扩展到通用DNN。相同的结果使我们能够证明用于分层学习的双曲线馈电网络是通用的。我们的结果还用于表明,除了最后两层外,所有除了最后两层的常见实践会产生具有概率的通用函数家族。我们还提供了DNN的第一个(最后)几层连接和激活函数的条件,这些连接和激活功能可以确保这些层的宽度可以等于输入(分别输出)空间的尺寸,同时又不对体系结构的近似功能产生负面影响。
Modifications to a neural network's input and output layers are often required to accommodate the specificities of most practical learning tasks. However, the impact of such changes on architecture's approximation capabilities is largely not understood. We present general conditions describing feature and readout maps that preserve an architecture's ability to approximate any continuous functions uniformly on compacts. As an application, we show that if an architecture is capable of universal approximation, then modifying its final layer to produce binary values creates a new architecture capable of deterministically approximating any classifier. In particular, we obtain guarantees for deep CNNs and deep feed-forward networks. Our results also have consequences within the scope of geometric deep learning. Specifically, when the input and output spaces are Cartan-Hadamard manifolds, we obtain geometrically meaningful feature and readout maps satisfying our criteria. Consequently, commonly used non-Euclidean regression models between spaces of symmetric positive definite matrices are extended to universal DNNs. The same result allows us to show that the hyperbolic feed-forward networks, used for hierarchical learning, are universal. Our result is also used to show that the common practice of randomizing all but the last two layers of a DNN produces a universal family of functions with probability one. We also provide conditions on a DNN's first (resp. last) few layer's connections and activation function which guarantee that these layers can have a width equal to the input (resp. output) space's dimension while not negatively affecting the architecture's approximation capabilities.