Robust reinforcement learning (RL) aims at learning a policy that optimizes the worst-case performance over an uncertainty set. Given nominal Markov decision process (N-MDP) that generates samples for training, the set contains MDPs obtained by some perturbations from N-MDP. In this paper, we introduce a new uncertainty set containing more realistic MDPs in practice than the existing sets. Using this uncertainty set, we present a robust RL, named ARQ-Learning, for tabular cases. Also, we characterize the finite-time error bounds and prove that it converges as fast as Q-Learning and robust Q-Learning (i.e., the state-of-the-art robust RL method) while providing better robustness for real applications. We propose {\em pessimistic agent} that efficiently tackles the key bottleneck for the extension of ARQ-Learning into large or continuous state spaces. Using this technique, we first propose PRQ-Learning. To the next, combining this with DQN and DDPG, we develop PR-DQN and PR-DDPG, respectively. We emphasize that our technique can be easily combined with the other popular model-free methods. Via experiments, we demonstrate the superiority of the proposed methods in various RL applications with model uncertainties.

介绍了一种新的不确定性集合并基于此提出了一种名为ARQ-Learning的鲁棒强化学习方法，同时还提出一种能高效解决ARQ-Learning在大规模或连续状态空间下的问题的技术，最终将其应用于各种存在模型不确定性的强化学习应用中。

实用鲁棒强化学习：邻域不确定性集和双代理算法