AbstractMulti-step greedy policies have been extensively used in model-based
reinforcement learning (RL) and in the case when a model of the environment is available (e.g., in the game of Go). In this work, we explore the benefits of multi-step greedy policies in
→