This study presents a benchmark for evaluating action-constrained reinforcement learning (RL) algorithms. In action-constrained RL, each action taken by the learning system must comply with certain constraints. These constraints are crucial for ensuring the feasibility and safety of actions in real-world systems. We evaluate existing algorithms and their novel variants across multiple robotics control environments, encompassing multiple action constraint types. Our evaluation provides the first in-depth perspective of the field, revealing surprising insights, including the effectiveness of a straightforward baseline approach. The benchmark problems and associated code utilized in our experiments are made available online at github.com/omron-sinicx/action-constrained-RL-benchmark for further research and development.

本研究提出了一种用于评估行动受限强化学习算法的基准测试，对现有算法及其新颖变种在多个机器人控制环境下进行评估，提供了领域的第一个深入视角，并揭示了令人惊讶的见解，包括普通基准方法的有效性。我们的实验中使用的基准问题和相关代码可在github.com/omron-sinicx/action-constrained-RL-benchmark上获得以供进一步研究和发展。

基于行为约束的机器人控制 Actor-Critic 深度强化学习算法基准测试