BriefGPT.xyz
Dec, 2021
基于保守模型的演员-评论家算法的高样本效率强化学习
Sample-Efficient Reinforcement Learning via Conservative Model-Based Actor-Critic
HTML
PDF
Zhihai Wang, Jie Wang, Qi Zhou, Bin Li, Houqiang Li
TL;DR
本研究提出了一种保守的基于模型的演员-评论家方法(CMBAC),通过从多个不准确的模型中学习Q值函数,利用底部k个估计的平均值来优化策略,从而实现高样本效率,尤其是在噪声环境下表现更加优越。
Abstract
model-based reinforcement learning
algorithms, which aim to learn a model of the environment to make decisions, are more sample efficient than their model-free counterparts. The
sample efficiency
of model-based a
→