We propose a quadratic penalty method for continual learning of neural networks that contain batch normalization (BN) layers. The Hessian of a loss function represents the curvature of the quadratic penalty function, and a Kronecker-factored approximate curvature (K-FAC) is used widely to practically compute the Hessian of a neural network. However, the approximation is not valid if there is dependence between examples, typically caused by BN layers in deep network architectures. We extend the K-FAC method so that the inter-example relations are taken into account and the Hessian of deep neural networks can be properly approximated under practical assumptions. We also propose a method of weight merging and reparameterization to properly handle statistical parameters of BN, which plays a critical role for continual learning with BN, and a method that selects hyperparameters without source task data. Our method shows better performance than baselines in the permuted MNIST task with BN layers and in sequential learning from the ImageNet classification task to fine-grained classification tasks with ResNet-50, without any explicit or implicit use of source task data for hyperparameter selection.

提出了一种二次罚函数方法用于神经网络的不断学习，其中包含批量归一化层。通过考虑实例间的关系，扩展了K-FAC方法，以便在实际情况下正确逼近深度神经网络的Hessian矩阵。同时提出了一种权重合并和再参数化方法，并且对批归一化的统计参数进行了适当处理。实验结果表明，该方法在各项指标上均优于基准算法。

扩展 Kronecker 分解近似曲率的续学习