AbstractThe $Q$-learning algorithm is a simple and widely-used
stochastic approximation scheme for reinforcement learning, but the basic protocol can exhibit instability in conjunction with function approximation. Such instability can be observed even with
→