Multi-agent control problems constitute an interesting area of application for deep reinforcement learning models with continuous action spaces. Such real-world applications, however, typically come with critical safety constraints that must not be violated. In order to ensure safety, we enhance the well-known multi-agent deep deterministic policy gradient (MADDPG) framework by adding a safety layer to the deep policy network. %which automatically corrects invalid actions. In particular, we extend the idea of linearizing the single-step transition dynamics, as was done for single-agent systems in Safe DDPG (Dalal et al., 2018), to multi-agent settings. We additionally propose to circumvent infeasibility problems in the action correction step using soft constraints (Kerrigan & Maciejowski, 2000). Results from the theory of exact penalty functions can be used to guarantee constraint satisfaction of the soft constraints under mild assumptions. We empirically find that the soft formulation achieves a dramatic decrease in constraint violations, making safety available even during the learning procedure.

本文介绍了在深度强化学习模型中添加安全层以确保多智能体控制问题的安全性的方法，该方法采用线性化单步转换动态的思想，并使用软约束解决了实施步骤中的不可行性问题，在保证软约束的约束满足性的基础上实现了学习过程中的安全控制。

连续动作空间下的多智能体系统安全强化学习