The successes of deep learning, variational inference, and many other fields
have been aided by specialized implementations of reverse-mode automatic
differentiation (AD) to compute gradients of mega-dimensional objectives. The
AD techniques underlying these tools were designed to compute exact gradients
to numerical precision, but modern machine learning mo