Safe reinforcement learning (RL) is crucial for deploying RL agents in
real-world applications, as it aims to maximize long-term rewards while
satisfying safety constraints. However, safe RL often suffers from sample
inefficiency, requiring extensive interactions with the environment to learn a
safe policy. We propose Efficient Safe Policy Optimization (ESPO), a novel
approach that enhances the efficiency of safe RL through sample manipulation.
ESPO employs an optimization framework with three modes: maximizing rewards,
minimizing costs, and balancing the trade-off between the two. By dynamically
adjusting the sampling process based on the observed conflict between reward
and safety gradients, ESPO theoretically guarantees convergence, optimization
stability, and improved sample complexity bounds. Experiments on the
Safety-MuJoCo and Omnisafe benchmarks demonstrate that ESPO significantly
outperforms existing primal-based and primal-dual-based baselines in terms of
reward maximization and constraint satisfaction. Moreover, ESPO achieves
substantial gains in sample efficiency, requiring 25--29% fewer samples than
baselines, and reduces training time by 21--38%.

通过样本操作提高安全强化学习的效率，动态调整采样过程以最大程度地最小化成本和最大化奖励之间的平衡，ESPO 理论上保证了收敛性、优化稳定性和改善样本复杂性界限。在 Safety-MuJoCo 和 Omnisafe 基准测试中，ESPO 在奖励最大化和约束满足方面明显优于现有基线方法，同时显著提高了样本效率，与基线方法相比，所需样本减少了 25-29％，训练时间减少了 21-38％。

通过样本操作提高安全强化学习的效率

Enhancing Efficiency of Safe Reinforcement Learning via Sample  Manipulation

We propose a novel point-based representation, Gaussian surfels, to combine
the advantages of the flexible optimization procedure in 3D Gaussian points and
the surface alignment property of surfels. This is achieved by directly setting
the z-scale of 3D Gaussian points to 0, effectively flattening the original 3D
ellipsoid into a 2D ellipse. Such a design provides clear guidance to the
optimizer. By treating the local z-axis as the normal direction, it greatly
improves optimization stability and surface alignment. While the derivatives to
the local z-axis computed from the covariance matrix are zero in this setting,
we design a self-supervised normal-depth consistency loss to remedy this issue.
Monocular normal priors and foreground masks are incorporated to enhance the
quality of the reconstruction, mitigating issues related to highlights and
background. We propose a volumetric cutting method to aggregate the information
of Gaussian surfels so as to remove erroneous points in depth maps generated by
alpha blending. Finally, we apply screened Poisson reconstruction method to the
fused depth maps to extract the surface mesh. Experimental results show that
our method demonstrates superior performance in surface reconstruction compared
to state-of-the-art neural volume rendering and point-based rendering methods.

我们提出了一种新颖的基于点的表示方法，高斯 surfels，通过直接将 3D 高斯点的 z 缩放设置为 0，将原始的 3D 椭球形状变为 2D 椭圆，进而提供了对优化过程的明确指导，从而在优化稳定性和表面对齐方面取得了显著的改进。我们还设计了一个自监督的法线深度一致性损失，以解决由于该设置使得从协方差矩阵计算得到的法线方向导数为零的问题。通过引入单目法线和前景蒙版，我们提高了重建质量，并缓解了与高光和背景相关的问题。此外，我们提出了一种用于聚合高斯 surfels 信息以去除通过 alpha 混合生成的深度图中的错误点的容积切割方法。最后，我们应用屏蔽泊松重建方法提取了表面网格。实验证明，与最先进的神经体积渲染和基于点的渲染方法相比，我们的方法在表面重建方面表现出卓越的性能。

高质量表面重建的高斯点云

High-quality Surface Reconstruction using Gaussian Surfels

Kernel estimation is generally one of the key problems for blind image
super-resolution (SR). Recently, Double-DIP proposes to model the kernel via a
network architecture prior, while KernelGAN employs the deep linear network and
several regularization losses to constrain the kernel space. However, they fail
to fully exploit the general SR kernel assumption that anisotropic Gaussian
kernels are sufficient for image SR. To address this issue, this paper proposes
a normalizing flow-based kernel prior (FKP) for kernel modeling. By learning an
invertible mapping between the anisotropic Gaussian kernel distribution and a
tractable latent distribution, FKP can be easily used to replace the kernel
modeling modules of Double-DIP and KernelGAN. Specifically, FKP optimizes the
kernel in the latent space rather than the network parameter space, which
allows it to generate reasonable kernel initialization, traverse the learned
kernel manifold and improve the optimization stability. Extensive experiments
on synthetic and real-world images demonstrate that the proposed FKP can
significantly improve the kernel estimation accuracy with less parameters,
runtime and memory usage, leading to state-of-the-art blind SR results.

本文提出了一种基于正交流的核先验（FKP），可以有效地解决缺失约束的核估计问题，其中，FKP 可以在隐空间中优化核而不是网络参数空间，从而提供了合理的核初始化，遍历学习到的核流形并提高优化稳定性。