Kazuki Osawa, Satoki Ishikawa, Rio Yokota, Shigang Li, Torsten Hoefler
TL;DR本文提出基于 Automatic Second-order Differentiation Library (ASDL) for PyTorch 的梯度预处理方法,该方法可以以一种统一的方式,实现和比较多种梯度预处理方法。
Abstract
gradient preconditioning is a key technique to integrate the second-order information into gradients for improving and extending gradient-based learning algorithms. In →