TL;DR本文提出了一种分布式抗干扰强化学习算法,即Robust Phased Value Learning算法,该算法针对四种不同的差距度量指标的不确定性集合进行求解,得到的结果在样本复杂度方面比现有结果具有更好的一致性。
Abstract
We consider the problem of learning a control policy that is robust against the parameter mismatches between the training environment and testing environment. We formulate this as a distributionally robust reinforcement learning (DR-RL) problem where the objective is to learn the polic