Reinforcement learning (RL) agents need to be robust to variations in safety-critical environments. While system identification methods provide a way to infer the variation from online experience, they can fail in settings where fast identification is not possible. Another dominant approach is robust RL which produces a policy that can handle worst-case scenarios, but these methods are generally designed to achieve robustness to a single uncertainty set that must be specified at train time. Towards a more general solution, we formulate the multi-set robustness problem to learn a policy robust to different perturbation sets. We then design an algorithm that enjoys the benefits of both system identification and robust RL: it reduces uncertainty where possible given a few interactions, but can still act robustly with respect to the remaining uncertainty. On a diverse set of control tasks, our approach demonstrates improved worst-case performance on new environments compared to prior methods based on system identification and on robust RL alone.

本文提出了一个更加通用的解决方案来解决强化学习中的鲁棒性问题，设计了一种算法，该算法结合了系统识别和鲁棒强化学习的优点，解决在不同情况下的不确定性问题，并在多个控制任务中获得了比之前方法更好的最坏情况执行性能。

多重不确定性集合上的鲁棒策略学习