BriefGPT.xyz
Jun, 2022
蒙特卡罗批判优化引导强化学习中的探索
Guided Exploration in Reinforcement Learning via Monte Carlo Critic Optimization
HTML
PDF
Igor Kuznetsov
TL;DR
本文提出了一种基于差分定向控制器的指引式探索方法,采用可扩展的探索行为修正,提高了传统探索方案的效率,并为政策和评论者修改提供了一种新算法,优于DMControl套件中现代强化学习算法.
Abstract
The class of
deep deterministic off-policy algorithms
is effectively applied to solve challenging
continuous control problems
. However, current approaches use random noise as a common
→