Model-free reinforcement learning methods lack an inherent mechanism to
impose behavioural constraints on the trained policies. While certain
extensions exist, they remain limited to specific types of constraints, such as
value constraints with additional reward signals or visitation density
constraints. In this work we try to unify these existing techniques and bridge
the gap with classical optimization and control theory, using a generic
primal-dual framework for value-based and actor-critic reinforcement learning
methods. The obtained dual formulations turn out to be especially useful for
imposing additional constraints on the learned policy, as an intrinsic
relationship between such dual constraints (or regularization terms) and reward
modifications in the primal is reveiled. Furthermore, using this framework, we
are able to introduce some novel types of constraints, allowing to impose
bounds on the policy's action density or on costs associated with transitions
between consecutive states and actions. From the adjusted primal-dual
optimization problems, a practical algorithm is derived that supports various
combinations of policy constraints that are automatically handled throughout
training using trainable reward modifications. The resulting $\texttt{DualCRL}$
method is examined in more detail and evaluated under different (combinations
of) constraints on two interpretable environments. The results highlight the
efficacy of the method, which ultimately provides the designer of such systems
with a versatile toolbox of possible policy constraints.

通过使用一种通用的原始对偶框架，将经典优化和控制理论与基于值和演员 - 评论家强化学习方法结合，本研究旨在统一和整合现有技术，并为学习的策略施加附加约束。构建出的 $	exttt {DualCRL}$ 算法支持各种策略约束的组合，在训练过程中使用可训练的奖励修改实现自动处理，实验证明了该方法的有效性，并为系统设计者提供了多种策略约束的工具箱。

强化学习的双重视角对政策约束的施加

A Dual Perspective of Reinforcement Learning for Imposing Policy  Constraints

Learned iterative reconstructions hold great promise to accelerate
tomographic imaging with empirical robustness to model perturbations.
Nevertheless, an adoption for photoacoustic tomography is hindered by the need
to repeatedly evaluate the computational expensive forward model. Computational
feasibility can be obtained by the use of fast approximate models, but a need
to compensate model errors arises. In this work we advance the methodological
and theoretical basis for model corrections in learned image reconstructions by
embedding the model correction in a learned primal-dual framework. Here, the
model correction is jointly learned in data space coupled with a learned
updating operator in image space within an unrolled end-to-end learned
iterative reconstruction approach. The proposed formulation allows an extension
to a primal-dual deep equilibrium model providing fixed-point convergence as
well as reduced memory requirements for training. We provide theoretical and
empirical insights into the proposed models with numerical validation in a
realistic 2D limited-view setting. The model-corrected learned primal-dual
methods show excellent reconstruction quality with fast inference times and
thus providing a methodological basis for real-time capable and scalable
iterative reconstructions in photoacoustic tomography.

该研究提出了嵌入模型修正的学习原始 - 对偶框架的方法，为光 - acoustic 成像的快速迭代重建提供了可行的模型，实现了实时的可扩展，具有较快推理时间和优异的重建质量。

快速有限视角光声成像的模型校正学习原始 - 对偶模型

Model-corrected learned primal-dual models for fast limited-view  photoacoustic tomography

We consider the decentralized convex optimization problem, where multiple
agents must cooperatively minimize a cumulative objective function, with each
local function expressible as an empirical average of data-dependent losses.
State-of-the-art approaches for decentralized optimization rely on gradient
tracking, where consensus is enforced via a doubly stochastic mixing matrix.
Construction of such mixing matrices is not straightforward and requires
coordination even prior to the start of the optimization algorithm. This paper
puts forth a primal-dual framework for decentralized stochastic optimization
that obviates the need for such doubly stochastic matrices. Instead, dual
variables are maintained to track the disagreement between neighbors. The
proposed framework is flexible and is used to develop decentralized variants of
SAGA, L-SVRG, SVRG++, and SEGA algorithms. Using a unified proof, we establish
that the oracle complexity of these decentralized variants is $O(1/\epsilon)$,
matching the complexity bounds obtained for the centralized variants.
Additionally, we also present a decentralized primal-dual accelerated SVRG
algorithm achieving $O(1/\sqrt{\epsilon})$ oracle complexity, again matching
the bound for the centralized accelerated SVRG. Numerical tests on the
algorithms establish their superior performance as compared to the
variance-reduced gradient tracking algorithms.

本文提出了一种基于原始 - 对偶框架的分布式优化算法，无需使用难以构建的双重随机混合矩阵，通过维护对偶变量来跟踪相邻节点之间的差异，使用这种方法构建的分布式算法比采用梯度跟踪的算法具有更好的性能。