Being able to reason in an environment with a large number of discrete
actions is essential to bringing reinforcement learning to a larger class of
problems. Recommender systems, industrial plants and language models are only
some of the many real-world tasks involving large numbers of