reinforcement learning (RL) in large or infinite state spaces is notoriously
challenging, both theoretically (where worst-case sample and computational
complexities must scale with state space cardinality) and experimentally (where
function approximation and policy gradient techniques