In complex tasks, such as those with large combinatorial action spaces,
random exploration may be too inefficient to achieve meaningful learning
progress. In this work, we use a curriculum of progressively growing action
spaces to accelerate learning. We assume the environment is out o