We devise a distributional variant of gradient temporal-difference (TD)
learning. distributional reinforcement learning has been demonstrated to
outperform the regular one in the recent study
\citep{bellemare2017distributional}. In the policy evaluation setting, we
design two new algor