Constrained Reinforcement Learning (RL) has emerged as a significant research area within RL, where integrating constraints with rewards is crucial for enhancing safety and performance across diverse control tasks. In the context of heating systems in the buildings, optimizing the energy efficiency while maintaining the residents' thermal comfort can be intuitively formulated as a constrained optimization problem. However, to solve it with RL may require large amount of data. Therefore, an accurate and versatile simulator is favored. In this paper, we propose a novel building simulator I4B which provides interfaces for different usages and apply a model-free constrained RL algorithm named constrained Soft Actor-Critic with Linear Smoothed Log Barrier function (CSAC-LB) to the heating optimization problem. Benchmarking against baseline algorithms demonstrates CSAC-LB's efficiency in data exploration, constraint satisfaction and performance.

本研究针对建筑供热系统中提升能效与居民热舒适度之间的优化难题，提出了一种新颖的方法。通过应用I4B仿真器和无模型受限强化学习算法CSAC-LB，显著提升了数据探索、约束满足与性能，展现了在安全控制中的实用潜力。

受限强化学习在安全热泵控制中的应用