Shielding is a popular technique for achieving safe reinforcement learning
(RL). However, classical shielding approaches come with quite restrictive
assumptions making them difficult to deploy in complex environments,
particularly those with continuous state or action spaces. In this paper we
extend the more versatile approximate model-based shielding (AMBS) framework to
the continuous setting. In particular we use Safety Gym as our test-bed,
allowing for a more direct comparison of AMBS with popular constrained RL
algorithms. We also provide strong probabilistic safety guarantees for the
continuous setting. In addition, we propose two novel penalty techniques that
directly modify the policy gradient, which empirically provide more stable
convergence in our experiments.

本文介绍了在连续环境中实现安全强化学习的方法，使用了适用于连续环境的近似基于模型的屏蔽 (AMBS) 框架，并提出了两种新的惩罚技术来改进策略梯度的稳定收敛性。

利用近似模型防护在连续环境中实现概率安全保证

Leveraging Approximate Model-based Shielding for Probabilistic Safety  Guarantees in Continuous Environments

Reinforcement learning (RL) has shown great potential for solving complex
tasks in a variety of domains. However, applying RL to safety-critical systems
in the real-world is not easy as many algorithms are sample-inefficient and
maximising the standard RL objective comes with no guarantees on worst-case
performance. In this paper we propose approximate model-based shielding (AMBS),
a principled look-ahead shielding algorithm for verifying the performance of
learned RL policies w.r.t. a set of given safety constraints. Our algorithm
differs from other shielding approaches in that it does not require prior
knowledge of the safety-relevant dynamics of the system. We provide a strong
theoretical justification for AMBS and demonstrate superior performance to
other safety-aware approaches on a set of Atari games with state-dependent
safety-labels.

我们提出了近似基于模型的屏蔽算法，用于验证学习强化学习策略相对于给定安全约束的性能，与其他安全感知方法相比，在一组具有状态相关安全标签的 Atari 游戏上表现出卓越的性能。