Value approximation using deep neural networks is at the heart of off-policy
deep reinforcement learning, and is often the primary module that provides
learning signals to the rest of the algorithm. While multi-layer perceptron
networks are universal function approximators, recent work