In high-stake scenarios like medical treatment and auto-piloting, it's risky
or even infeasible to collect online experimental data to train the agent.
Simulation-based training can alleviate this issue, but may suffer from its
inherent mismatches from the simulator and real environment. It is therefore
imperative to utilize the simulator to learn a robust policy for the real-world
deployment. In this work, we consider policy learning for Robust Markov
Decision Processes (RMDP), where the agent tries to seek a robust policy with
respect to unexpected perturbations on the environments. Specifically, we focus
on the setting where the training environment can be characterized as a
generative model and a constrained perturbation can be added to the model
during testing. Our goal is to identify a near-optimal robust policy for the
perturbed testing environment, which introduces additional technical
difficulties as we need to simultaneously estimate the training environment
uncertainty from samples and find the worst-case perturbation for testing. To
solve this issue, we propose a generic method which formalizes the perturbation
as an opponent to obtain a two-player zero-sum game, and further show that the
Nash Equilibrium corresponds to the robust policy. We prove that, with a
polynomial number of samples from the generative model, our algorithm can find
a near-optimal robust policy with a high probability. Our method is able to
deal with general perturbations under some mild assumptions and can also be
extended to more complex problems like robust partial observable Markov
decision process, thanks to the game-theoretical formulation.

利用模拟器训练代理人以学习强健的策略是解决医疗、自动驾驶等高风险环境下数据实验不可行的问题。本篇研究以生成模型的形式将训练环境表达，并提出了一种基于博弈论的算法解决了在测试中出现的扰动与环境不确定性的问题，得到了一个近似最优的强健决策。

基于非匹配生成模型的稳健马尔可夫决策过程的策略学习

Policy Learning for Robust Markov Decision Process with a Mismatched  Generative Model

The existing Neural ODE formulation relies on an explicit knowledge of the
termination time. We extend Neural ODEs to implicitly defined termination
criteria modeled by neural event functions, which can be chained together and
differentiated through. Neural Event ODEs are capable of modeling discrete and
instantaneous changes in a continuous-time system, without prior knowledge of
when these changes should occur or how many such changes should exist. We test
our approach in modeling hybrid discrete- and continuous- systems such as
switching dynamical systems and collision in multi-body systems, and we propose
simulation-based training of point processes with applications in discrete
control.

将神经常微分方程 (Neural ODE) 扩展到使用由神经事件函数表示的隐式终止标准来建模离散和瞬时连续时间系统的改变，对切合动态系统和多体系统中的碰撞建模并提出基于模拟的点过程训练方法。

学习普通微分方程的神经事件函数

Learning Neural Event Functions for Ordinary Differential Equations

Simulation-based training (SBT) is gaining popularity as a low-cost and
convenient training technique in a vast range of applications. However, for a
SBT platform to be fully utilized as an effective training tool, it is
essential that feedback on performance is provided automatically in real-time
during training. It is the aim of this paper to develop an efficient and
effective feedback generation method for the provision of real-time feedback in
SBT. Existing methods either have low effectiveness in improving novice skills
or suffer from low efficiency, resulting in their inability to be used in
real-time. In this paper, we propose a neural network based method to generate
feedback using the adversarial technique. The proposed method utilizes a
bounded adversarial update to minimize a L1 regularized loss via
back-propagation. We empirically show that the proposed method can be used to
generate simple, yet effective feedback. Also, it was observed to have high
effectiveness and efficiency when compared to existing methods, thus making it
a promising option for real-time feedback generation in SBT.

本文提出了一个基于神经网络和对抗技术的实时反馈生成方法，通过有界对抗更新和反向传播最小化 L1 正则化损失，实现了高效且有效的反馈生成，相比现有方法具有更高的效力和效率，在模拟训练领域具有广泛的应用前景。