Although RL is highly general and scalable, the difficulty of verifying policy behaviours poses challenges for safety-critical applications. To remedy this, we propose to apply verification methods used in control theory to learned value functions. By analyzing a simple task structure for safety preservation, we derive original theorems linking value functions to control barrier functions. Inspired by this, we propose novel metrics for verification of value functions in safe control tasks, and practical implementation details that improve learning. Besides proposing a novel method for certificate learning, our work unlocks a wealth of verification methods in control theory for RL policies, and represents a first step towards a framework for general, scalable, and verifiable design of control systems.

本篇研究提出了将控制理论中的验证方法应用于学习价值函数中的RL问题，由此得出关于安全维护的价值函数与控制障碍函数之间联系的原始定理，并提出用于安全控制任务验证价值函数的新指标和实用的实现细节。此外，该研究作品还利用控制理论中的验证方法实现了证书学习，为RL策略设计提供了一种全新的思路。

您的价值函数是控制障碍函数：使用控制理论验证学习策略