Reinforcement learning (RL) is a promising optimal control technique for
multi-energy management systems. It does not require a model a priori -
reducing the upfront and ongoing project-specific engineering effort and is
capable of learning better representations of the underlying system dynamics.
However, vanilla RL does not provide constraint satisfaction guarantees -
resulting in various potentially unsafe interactions within its safety-critical
environment. In this paper, we present two novel safe RL methods, namely
SafeFallback and GiveSafe, where the safety constraint formulation is decoupled
from the RL formulation. These provide hard-constraint, rather than soft- and
chance-constraint, satisfaction guarantees both during training a (near)
optimal policy (which involves exploratory and exploitative, i.e. greedy,
steps) as well as during deployment of any policy (e.g. random agents or
offline trained RL agents). This without the need of solving a mathematical
program, resulting in less computational power requirements and a more flexible
constraint function formulation (no derivative information is required). In a
simulated multi-energy systems case study we have shown that both methods start
with a significantly higher utility (i.e. useful policy) compared to a vanilla
RL benchmark and Optlayer benchmark (94,6% and 82,8% compared to 35,5% and
77,8%) and that the proposed SafeFallback method even can outperform the
vanilla RL benchmark (102,9% to 100%). We conclude that both methods are viably
safety constraint handling techniques applicable beyond RL, as demonstrated
with random policies while still providing hard-constraint guarantees.

本文提出了两种新的安全强化学习方法，即 SafeFallback 和 GiveSafe，其安全约束公式与 RL 公式分离，可提供硬约束满足保证，且无需解决数学问题，从而降低计算能力要求，并具有更灵活的约束公式表述。方法可应用于超出 RL 的任何策略，同时提供硬约束保证，并在模拟多能源系统案例研究中验证了方法的有效性。