论文标题
解开神经体系结构和权重:监督分类的案例研究
Disentangling Neural Architectures and Weights: A Case Study in Supervised Classification
论文作者
论文摘要
深度学习的历史表明,特定于问题的网络可以大大提高一般神经模型的分类性能。但是,在大多数实际情况下,为给定任务选择最佳体系结构仍然是一个具有挑战性的问题。最近的Architecture-Search方法能够自动构建具有较强性能的神经模型,但无法充分理解神经体系结构与权重之间的相互作用。这项工作通过表明训练有素的体系结构可能不需要对权重的任何特定于链路的微调来调查解开神经结构及其边缘权重的作用的问题。我们比较了这种无重量网络的性能(在我们的情况下,这些是具有{0,1}值的二进制网络),该网络具有随机,权重,修剪和标准完全连接的网络。为了找到最佳的权重敏捷网络,我们使用一种新颖的计算高效方法,将硬体系结构搜索问题转化为可行的优化问题。更具体地说,我们将最佳的特定任务特异性体系结构视为具有{0,1,1} valueed权重的二元网络的最佳配置,可以通过{0,1} valuew strapes逐渐逐渐逐步逐步逐步逐渐逐步降级策略。提出的算法的理论收敛保证是通过在梯度近似中界定误差来获得的,并且在两个现实世界数据集上评估其实际性能。为了测量不同体系结构之间的结构相似性,我们使用了一种新型的光谱方法,使我们能够强调实价网络和无重量体系结构之间的内在差异。
The history of deep learning has shown that human-designed problem-specific networks can greatly improve the classification performance of general neural models. In most practical cases, however, choosing the optimal architecture for a given task remains a challenging problem. Recent architecture-search methods are able to automatically build neural models with strong performance but fail to fully appreciate the interaction between neural architecture and weights. This work investigates the problem of disentangling the role of the neural structure and its edge weights, by showing that well-trained architectures may not need any link-specific fine-tuning of the weights. We compare the performance of such weight-free networks (in our case these are binary networks with {0, 1}-valued weights) with random, weight-agnostic, pruned and standard fully connected networks. To find the optimal weight-agnostic network, we use a novel and computationally efficient method that translates the hard architecture-search problem into a feasible optimization problem.More specifically, we look at the optimal task-specific architectures as the optimal configuration of binary networks with {0, 1}-valued weights, which can be found through an approximate gradient descent strategy. Theoretical convergence guarantees of the proposed algorithm are obtained by bounding the error in the gradient approximation and its practical performance is evaluated on two real-world data sets. For measuring the structural similarities between different architectures, we use a novel spectral approach that allows us to underline the intrinsic differences between real-valued networks and weight-free architectures.