持续学习，扩展KRONECKER近似曲率

论文标题

持续学习，扩展KRONECKER近似曲率

Continual Learning with Extended Kronecker-factored Approximate Curvature

论文作者

Lee, Janghyeon, Hong, Hyeong Gwon, Joo, Donggyu, Kim, Junmo

论文摘要

我们提出了一种二次惩罚方法，用于连续学习包含批归当（BN）层的神经网络。损耗函数的Hessian代表二次惩罚函数的曲率，而Kronecker因近似曲率（K-FAC）被广泛用于计算神经网络的Hessian。但是，如果示例之间存在依赖性，通常是由深网架构中的BN层引起的，则近似值是无效的。我们扩展了K-FAC方法，以便考虑到示例关系，并且在实际假设下可以正确近似深神经网络的Hessian。我们还提出了一种重量合并和重新聚体化的方法，以正确处理BN的统计参数，该参数对于使用BN的持续学习起着至关重要的作用，以及一种选择没有源任务数据的超参数的方法。我们的方法在带有BN层的置换MNIST任务中显示出比基线更好的性能，并且在从Imagenet分类任务中进行的顺序学习中，使用RESNET-50到细粒度的分类任务，而没有任何明确或隐含的源任务数据用于超参数选择。

We propose a quadratic penalty method for continual learning of neural networks that contain batch normalization (BN) layers. The Hessian of a loss function represents the curvature of the quadratic penalty function, and a Kronecker-factored approximate curvature (K-FAC) is used widely to practically compute the Hessian of a neural network. However, the approximation is not valid if there is dependence between examples, typically caused by BN layers in deep network architectures. We extend the K-FAC method so that the inter-example relations are taken into account and the Hessian of deep neural networks can be properly approximated under practical assumptions. We also propose a method of weight merging and reparameterization to properly handle statistical parameters of BN, which plays a critical role for continual learning with BN, and a method that selects hyperparameters without source task data. Our method shows better performance than baselines in the permuted MNIST task with BN layers and in sequential learning from the ImageNet classification task to fine-grained classification tasks with ResNet-50, without any explicit or implicit use of source task data for hyperparameter selection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题