BriefGPT.xyz
Jun, 2023
基于状态规约的动力转移数据策略优化
State Regularized Policy Optimization on Data with Dynamics Shift
HTML
PDF
Zhenghai Xue, Qingpeng Cai, Shuchang Liu, Dong Zheng, Peng Jiang...
TL;DR
通过学习具有相似环境结构但不同动力学的数据的稳态分布,使用稳态分布规范化策略并构建SRPO算法来解决Reinforcement Learning算法训练数据分布不同的问题,并在实验中验证了其有效性。
Abstract
In many real-world scenarios,
reinforcement learning
(RL) algorithms are trained on data with
dynamics shift
, i.e., with different underlying environment dynamics. A majority of current methods address such issue
→