Continuous action spaces in reinforcement learning (RL) are commonly defined
as interval sets. While intervals usually reflect the action boundaries for
tasks well, they can be challenging for learning because the typically large
global action space leads to frequent exploration of irrelevant actions. Yet,
little task knowledge can be sufficient to identify significantly smaller
state-specific sets of relevant actions. Focusing learning on these relevant
actions can significantly improve training efficiency and effectiveness. In
this paper, we propose to focus learning on the set of relevant actions and
introduce three continuous action masking methods for exactly mapping the
action space to the state-dependent set of relevant actions. Thus, our methods
ensure that only relevant actions are executed, enhancing the predictability of
the RL agent and enabling its use in safety-critical applications. We further
derive the implications of the proposed methods on the policy gradient. Using
Proximal Policy Optimization (PPO), we evaluate our methods on three control
tasks, where the relevant action set is computed based on the system dynamics
and a relevant state set. Our experiments show that the three action masking
methods achieve higher final rewards and converge faster than the baseline
without action masking.

本研究论文中，我们提出了三种连续动作屏蔽方法，以精确地将动作空间映射到与状态相关的相关动作集合，从而确保只有相关动作被执行，提高增强学习代理的可预测性，并使其在安全关键应用中得到应用。实验结果显示，这三种动作屏蔽方法比没有动作屏蔽的基线方法能够获得更高的最终奖励并更快地收敛。