Real-world systems are often formulated as constrained optimization problems.
Techniques to incorporate constraints into Neural Networks (NN), such as Neural
Ordinary Differential Equations (Neural ODEs), have been used. However, these
introduce hyperparameters that require manual tuning through trial and error,
raising doubts about the successful incorporation of constraints into the
generated model. This paper describes in detail the two-stage training method
for Neural ODEs, a simple, effective, and penalty parameter-free approach to
model constrained systems. In this approach the constrained optimization
problem is rewritten as two unconstrained sub-problems that are solved in two
stages. The first stage aims at finding feasible NN parameters by minimizing a
measure of constraints violation. The second stage aims to find the optimal NN
parameters by minimizing the loss function while keeping inside the feasible
region. We experimentally demonstrate that our method produces models that
satisfy the constraints and also improves their predictive performance. Thus,
ensuring compliance with critical system properties and also contributing to
reducing data quantity requirements. Furthermore, we show that the proposed
method improves the convergence to an optimal solution and improves the
explainability of Neural ODE models. Our proposed two-stage training method can
be used with any NN architectures.

本文详细描述了一种简单、有效且无需惩罚参数的两阶段训练方法，用于模型约束系统。通过将约束优化问题重写为解决两个无约束子问题的两阶段，实现了找到可行神经网络参数和最优神经网络参数。实验证明，该方法可以产生满足约束的模型，并提升预测性能，确保关键系统属性的合规性并减少数据需求量。此外，我们还展示了该方法改善了求解最优解的收敛性和解释可行的神经常微分方程模型的能力。我们的两阶段训练方法适用于任何神经网络架构。

使用神经网络对受约束系统进行建模的两阶段训练方法

A Two-Stage Training Method for Modeling Constrained Systems With Neural  Networks

Safe offline RL is a promising way to bypass risky online interactions
towards safe policy learning. Most existing methods only enforce soft
constraints, i.e., constraining safety violations in expectation below
thresholds predetermined. This can lead to potentially unsafe outcomes, thus
unacceptable in safety-critical scenarios. An alternative is to enforce the
hard constraint of zero violation. However, this can be challenging in offline
setting, as it needs to strike the right balance among three highly intricate
and correlated aspects: safety constraint satisfaction, reward maximization,
and behavior regularization imposed by offline datasets. Interestingly, we
discover that via reachability analysis of safe-control theory, the hard safety
constraint can be equivalently translated to identifying the largest feasible
region given the offline dataset. This seamlessly converts the original trilogy
problem to a feasibility-dependent objective, i.e., maximizing reward value
within the feasible region while minimizing safety risks in the infeasible
region. Inspired by these, we propose FISOR (FeasIbility-guided Safe Offline
RL), which allows safety constraint adherence, reward maximization, and offline
policy learning to be realized via three decoupled processes, while offering
strong safety performance and stability. In FISOR, the optimal policy for the
translated optimization problem can be derived in a special form of weighted
behavior cloning. Thus, we propose a novel energy-guided diffusion model that
does not require training a complicated time-dependent classifier to extract
the policy, greatly simplifying the training. We compare FISOR against
baselines on DSRL benchmark for safe offline RL. Evaluation results show that
FISOR is the only method that can guarantee safety satisfaction in all tasks,
while achieving top returns in most tasks.

通过可行区域定义的安全约束，最大化可行区域内的回报值并将不可行区域内的安全风险最小化的 FISOR（FeasIbility-guided Safe Offline RL）是唯一可以保证所有任务满足安全要求并在大多数任务上实现最高回报的方法。