论文标题
通过在多目标学习方法中修剪权重,有效而稀疏的神经网络
Efficient and Sparse Neural Networks by Pruning Weights in a Multiobjective Learning Approach
论文作者
论文摘要
在设计和培训深度神经网络时,过度参数化和过度拟合是普遍的问题,这些神经网络通常会被修剪和正则化策略所抵消。但是,这些策略仍然是大多数学习方法的次要次数,并且遭受时间和计算密集型程序的痛苦。我们通过将其预测准确性和网络复杂性视为生物原则优化问题中的两个单独的目标函数,建议对神经网络进行训练的多主观观点。作为展示示例,我们使用横熵作为对预测准确性的度量,同时采用L1骨函数来评估网络参数的总成本(或复杂性)。后者与训练的修剪方法结合使用,该方法可以增强复杂性的降低,并且只需要边际额外的计算成本。从多目标优化的角度来看,这是一个真正的大规模优化问题。我们比较了两个不同的优化范式:一方面,我们采用了一种基于标量的方法,将生物主体问题转化为一系列加权和标量表。另一方面,我们实现了随机多距离下降算法,该算法在不需要或使用首选项信息的情况下生成单个Pareto最佳解决方案。在第一种情况下,通过具有自适应选择的标量参数的重复训练运行来识别出良好的膝关节溶液。示例性卷积神经网络的初步数值结果证实,可以大幅度降低神经网络的复杂性,而精确度的精确度丧失是可能的。
Overparameterization and overfitting are common concerns when designing and training deep neural networks, that are often counteracted by pruning and regularization strategies. However, these strategies remain secondary to most learning approaches and suffer from time and computational intensive procedures. We suggest a multiobjective perspective on the training of neural networks by treating its prediction accuracy and the network complexity as two individual objective functions in a biobjective optimization problem. As a showcase example, we use the cross entropy as a measure of the prediction accuracy while adopting an l1-penalty function to assess the total cost (or complexity) of the network parameters. The latter is combined with an intra-training pruning approach that reinforces complexity reduction and requires only marginal extra computational cost. From the perspective of multiobjective optimization, this is a truly large-scale optimization problem. We compare two different optimization paradigms: On the one hand, we adopt a scalarization-based approach that transforms the biobjective problem into a series of weighted-sum scalarizations. On the other hand we implement stochastic multi-gradient descent algorithms that generate a single Pareto optimal solution without requiring or using preference information. In the first case, favorable knee solutions are identified by repeated training runs with adaptively selected scalarization parameters. Preliminary numerical results on exemplary convolutional neural networks confirm that large reductions in the complexity of neural networks with neglibile loss of accuracy are possible.