The success of modern machine learning is due in part to the adaptive
optimization methods that have been developed to deal with the difficulties of
training large models over complex datasets. One such method is gradient
clipping: a practical procedure with limited theoretical underpi