While reinforcement learning (RL) is gaining popularity in energy systems
control, its real-world applications are limited due to the fact that the
actions from learned policies may not satisfy functional requirements or be
feasible for the underlying physical system. In this work, we propose PROjected
Feasibility (PROF), a method to enforce convex operational constraints within
neural policies. Specifically, we incorporate a differentiable projection layer
within a neural network-based policy to enforce that all learned actions are
feasible. We then update the policy end-to-end by propagating gradients through
this differentiable projection layer, making the policy cognizant of the
operational constraints. We demonstrate our method on two applications:
energy-efficient building operation and inverter control. In the building
operation setting, we show that PROF maintains thermal comfort requirements
while improving energy efficiency by 4% over state-of-the-art methods. In the
inverter control setting, PROF perfectly satisfies voltage constraints on the
IEEE 37-bus feeder system, as it learns to curtail as little renewable energy
as possible within its safety set.

本文提出了一种名为 PROF 的方法，可以在神经网络策略中加入可微的投影层，以满足凸运营约束，进而对能源系统的控制进行 RL 训练。我们在两个应用上进行了演示，并展示了 PROF 的性能提升。

基于可微投影的能耗优化策略可行性约束实施

Enforcing Policy Feasibility Constraints through Differentiable  Projection for Energy Optimization

A fascinating aspect of nature lies in its ability to produce a large and
diverse collection of organisms that are all high-performing in their niche. By
contrast, most AI algorithms focus on finding a single efficient solution to a
given problem. Aiming for diversity in addition to performance is a convenient
way to deal with the exploration-exploitation trade-off that plays a central
role in learning. It also allows for increased robustness when the returned
collection contains several working solutions to the considered problem, making
it well-suited for real applications such as robotics. Quality-Diversity (QD)
methods are evolutionary algorithms designed for this purpose. This paper
proposes a novel algorithm, QDPG, which combines the strength of Policy
Gradient algorithms and Quality Diversity approaches to produce a collection of
diverse and high-performing neural policies in continuous control environments.
The main contribution of this work is the introduction of a Diversity Policy
Gradient (DPG) that exploits information at the time-step level to drive
policies towards more diversity in a sample-efficient manner. Specifically,
QDPG selects neural controllers from a MAP-Elites grid and uses two
gradient-based mutation operators to improve both quality and diversity. Our
results demonstrate that QDPG is significantly more sample-efficient than its
evolutionary competitors.

本文提出了一种新算法 QDPG，它结合了策略梯度算法和质量多样性方法，用于在连续控制环境中生成多样化和高性能的神经控制器，并且比其他进化算法更具样本效率。