In recent years, we have witnessed the emergence of scientific machine
learning as a data-driven tool for the analysis, by means of deep-learning
techniques, of data produced by computational science and engineering
applications. At the core of these methods is the supervised training algorithm
to learn the neural network realization, a highly non-convex optimization
problem that is usually solved using stochastic gradient methods. However,
distinct from deep-learning practice, scientific machine-learning training
problems feature a much larger volume of smooth data and better
characterizations of the empirical risk functions, which make them suited for
conventional solvers for unconstrained optimization. We introduce a lightweight
software framework built on top of the Portable and Extensible Toolkit for
Scientific computation to bridge the gap between deep-learning software and
conventional solvers for unconstrained minimization. We empirically demonstrate
the superior efficacy of a trust region method based on the Gauss-Newton
approximation of the Hessian in improving the generalization errors arising
from regression tasks when learning surrogate models for a wide range of
scientific machine-learning techniques and test cases. All the conventional
second-order solvers tested, including L-BFGS and inexact Newton with
line-search, compare favorably, either in terms of cost or accuracy, with the
adaptive first-order methods used to validate the surrogate models.

最近几年，我们见证了科学机器学习作为一种数据驱动的工具的兴起，通过深度学习技术分析计算科学和工程应用产生的数据。这些方法的核心是监督训练算法，用于学习神经网络实现，这是一个非常非凸的优化问题，通常使用随机梯度方法来解决。然而，科学机器学习训练问题与深度学习实践不同，它们具有更大量的平滑数据和更好的经验风险函数特征，使它们适用于无约束优化的常规求解器。我们介绍了一个轻量级的软件框架，建立在可移植和可扩展科学计算工具包之上，以弥合深度学习软件和无约束最小化的常规求解器之间的差距。我们通过实验证明，基于高斯 - 牛顿近似 Hessian 的信任域方法在学习科学机器学习技术和测试用例的代理模型时，可以显著提高回归任务中产生的泛化误差。所有被测试的常规二阶求解器，包括 L-BFGS 和带有线搜索的非精确牛顿法，无论在成本还是准确性上都与用于验证代理模型的自适应一阶方法相比较有利。

PETScML：科学机器学习中用于训练回归问题的二阶求解器

PETScML: Second-order solvers for training regression problems in  Scientific Machine Learning

To apply reinforcement learning (RL) to real-world applications, agents are
required to adhere to the safety guidelines of their respective domains. Safe
RL can effectively handle the guidelines by converting them into constraints of
the RL problem. In this paper, we develop a safe distributional RL method based
on the trust region method, which can satisfy constraints consistently.
However, policies may not meet the safety guidelines due to the estimation bias
of distributional critics, and importance sampling required for the trust
region method can hinder performance due to its significant variance. Hence, we
enhance safety performance through the following approaches. First, we train
distributional critics to have low estimation biases using proposed target
distributions where bias-variance can be traded off. Second, we propose novel
surrogates for the trust region method expressed with Q-functions using the
reparameterization trick. Additionally, depending on initial policy settings,
there can be no policy satisfying constraints within a trust region. To handle
this infeasible issue, we propose a gradient integration method which
guarantees to find a policy satisfying all constraints from an unsafe initial
policy. From extensive experiments, the proposed method with risk-averse
constraints shows minimal constraint violations while achieving high returns
compared to existing safe RL methods.

本文提出了一种基于信任区域方法的安全分布式强化学习方法，包括针对分布式评论家的估计偏差的降低，用 Q 函数表示的信任区域方法的新代理以及从不安全的初始代理找到满足所有约束的代理的梯度集成方法，实验表明，该方法表现出最小的约束违规，同时实现了高收益。