We study the continuous-time counterpart of Q-learning for reinforcement
learning (RL) under the entropy-regularized, exploratory diffusion process
formulation introduced by Wang et al. (2020) As the conventional (big)
Q-function collapses in continuous time, we consider its first-order
approximation and coin the term "(little) q-function". This function is