In recent years, deep reinforcement learning (DRL) algorithms have achieved
state-of-the-art performance in many challenging strategy games. Because these
games have complicated rules, an action sampled from the full discrete action
distribution predicted by the learned policy is likel