Model-free reinforcement learning algorithms, such as Q-learning, perform
poorly in the early stages of learning in noisy environments, because much
effort is spent unlearning biased estimates of the state-action value function.
The bias results from selecting, among several noisy esti