Deep neural networks are easily fooled by small perturbations known as
adversarial attacks. Adversarial Training (AT) is a technique aimed at learning
features robust to such attacks and is widely regarded as a very effective
defense. However, the computational cost of such training can be prohibitive as
the network size and input dimensions grow. Inspired by the relationship
between robustness and curvature, we propose a novel regularizer which
incorporates first and second order information via a quadratic approximation
to the adversarial loss. The worst case quadratic loss is approximated via an
iterative scheme. It is shown that using only a single iteration in our
regularizer achieves stronger robustness than prior gradient and curvature
regularization schemes, avoids gradient obfuscation, and, with additional
iterations, achieves strong robustness with significantly lower training time
than AT. Further, it retains the interesting facet of AT that networks learn
features which are well-aligned with human perception. We demonstrate
experimentally that our method produces higher quality human-interpretable
features than other geometric regularization techniques. These robust features
are then used to provide human-friendly explanations to model predictions.

提出了一种使用二次近似的拟合函数的新型规则化器，并通过迭代计算逼近最坏情况二次损失，从而在具有良好的鲁棒性的同时避免了梯度混淆和降低了训练时间。实验证明，该模型产生的人类可解释性特征优于其他几何正则化技术，并且这些鲁棒特征可用于提供人性化的模型预测解释。

对抗鲁棒性和可解释性的二阶优化

Second Order Optimization for Adversarial Robustness and  Interpretability

Reinforcement Learning(RL) with sparse rewards is a major challenge. We
propose \emph{Hindsight Trust Region Policy Optimization}(HTRPO), a new RL
algorithm that extends the highly successful TRPO algorithm with
\emph{hindsight} to tackle the challenge of sparse rewards. Hindsight refers to
the algorithm's ability to learn from information across goals, including ones
not intended for the current task. HTRPO leverages two main ideas. It
introduces QKL, a quadratic approximation to the KL divergence constraint on
the trust region, leading to reduced variance in KL divergence estimation and
improved stability in policy update. It also presents Hindsight Goal
Filtering(HGF) to select conductive hindsight goals. In experiments, we
evaluate HTRPO in various sparse reward tasks, including simple benchmarks,
image-based Atari games, and simulated robot control. Ablation studies indicate
that QKL and HGF contribute greatly to learning stability and high performance.
Comparison results show that in all tasks, HTRPO consistently outperforms both
TRPO and HPG, a state-of-the-art algorithm for RL with sparse rewards.

我们提出了一种新的强化学习算法：Hindsight Trust Region Policy Optimization，它通过利用 hindsight 来提高稀疏抽奖的表现，并引入了 QKL 和 HGF 两种方法来提高学习稳定性和表现。我们在各种稀疏抽奖任务中评估了 HTRPO，包括简单的基准测试、基于图像的 Atari 游戏和模拟机器人控制。消融研究表明，QKL 和 HGF 对学习稳定性和高性能有很大贡献。比较结果表明，在所有任务中，HTRPO 始终优于 TRPO 和 HPG。

回顾性信任区域策略优化

Hindsight Trust Region Policy Optimization

This paper considers decentralized consensus optimization problems where
nodes of a network have access to different summands of a global objective
function. Nodes cooperate to minimize the global objective by exchanging
information with neighbors only. A decentralized version of the alternating
directions method of multipliers (DADMM) is a common method for solving this
category of problems. DADMM exhibits linear convergence rate to the optimal
objective but its implementation requires solving a convex optimization problem
at each iteration. This can be computationally costly and may result in large
overall convergence times. The decentralized quadratically approximated ADMM
algorithm (DQM), which minimizes a quadratic approximation of the objective
function that DADMM minimizes at each iteration, is proposed here. The
consequent reduction in computational time is shown to have minimal effect on
convergence properties. Convergence still proceeds at a linear rate with a
guaranteed constant that is asymptotically equivalent to the DADMM linear
convergence rate constant. Numerical results demonstrate advantages of DQM
relative to DADMM and other alternatives in a logistic regression problem.

本文提出了一种分散式的求解全局目标函数最小化问题的算法 —— 分散式二次逼近交替方向乘子法（DQM），通过在每次迭代中最小化 DADMM 最小化目标函数的二次逼近，可以减少计算成本并仍然保持收敛速度达到 DADMM 的线性收敛率常数，并在逻辑回归问题中展示了 DQM 相对于 DADMM 和其他替代算法的优势。