In the realm of autonomous agents, ensuring safety and reliability in complex
and dynamic environments remains a paramount challenge. Safe reinforcement
learning addresses these concerns by introducing safety constraints, but still
faces challenges in navigating intricate environments such as complex driving
situations. To overcome these challenges, we present the safe constraint reward
(Safe CoR) framework, a novel method that utilizes two types of expert
demonstrations$\unicode{x2013}$reward expert demonstrations focusing on
performance optimization and safe expert demonstrations prioritizing safety. By
exploiting a constraint reward (CoR), our framework guides the agent to balance
performance goals of reward sum with safety constraints. We test the proposed
framework in diverse environments, including the safety gym, metadrive, and the
real$\unicode{x2013}$world Jackal platform. Our proposed framework enhances the
performance of algorithms by $39\%$ and reduces constraint violations by $88\%$
on the real-world Jackal platform, demonstrating the framework's efficacy.
Through this innovative approach, we expect significant advancements in
real-world performance, leading to transformative effects in the realm of safe
and reliable autonomous agents.

在自主机器人领域，确保复杂和动态环境下的安全性和可靠性仍然是一个重大挑战。通过引入安全约束，安全强化学习解决这些问题，但在复杂驾驶环境等复杂环境中仍面临挑战。为了应对这些挑战，我们提出了安全约束奖励（Safe CoR）框架，这是一种利用两种类型的专家演示（重点是性能优化的奖励专家演示和优先考虑安全性的安全专家演示）的新方法。通过利用约束奖励（CoR），我们的框架指导智能体平衡奖励总和的性能目标与安全约束。我们在包括 safety gym、metadrive 和真实世界的 Jackal 平台在内的多种环境中测试了所提出的框架。在真实世界的 Jackal 平台上，我们提出的框架提高了算法的性能 39% 并减少了约束违规 88%，证明了该框架的有效性。通过这种创新方法，我们期望在现实世界的性能方面取得重大进展，从而在安全可靠的自主机器人领域产生深远影响。