Bayesian methods promise to fix many shortcomings of deep learning, but they are impractical and rarely match the performance of standard methods, let alone improve them. In this paper, we demonstrate practical training of deep networks with natural-gradient variational inference. By applying techniques such as batch normalisation, data augmentation, and distributed training, we achieve similar performance in about the same number of epochs as the Adam optimiser, even on large datasets such as ImageNet. Importantly, the benefits of Bayesian principles are preserved: predictive probabilities are well-calibrated and uncertainties on out-of-distribution data are improved. This work enables practical deep learning while preserving benefits of Bayesian principles. A PyTorch implementation will be available as a plug-and-play optimiser.

本研究利用自然梯度变分推理方法对深度神经网络进行实用性的训练，并通过批归一化、数据扩充和分布式训练等技术获得类似于Adam优化器的性能，即使在ImageNet等大型数据集上也是如此。此外，本研究验证了使用贝叶斯原理的好处: 预测概率被很好地校准，超出分布数据的不确定性得到改善，并且持续学习性能得到提高。该研究旨在实现实用性的深度学习，并同时保留贝叶斯原理的好处。最后提供了一个 PyTorch 的实现优化器。

基于贝叶斯原理的实用深度学习