BriefGPT.xyz
May, 2021
基于行动候选的裁剪双 Q 学习适用于离散和连续动作任务
Action Candidate Based Clipped Double Q-learning for Discrete and Continuous Action Tasks
HTML
PDF
Haobo Jiang, Jin Xie, Jian Yang
TL;DR
本文提出了一种基于动作候选的剪裁双估计器算法,用于降低去估计剪裁双 Q 学习算法中的低估计偏差,经实验证明该算法可以更准确地估计最大期望行动价值,并在几个基准问题中表现良好。
Abstract
double q-learning
is a popular
reinforcement learning
algorithm in
markov decision process
(MDP) problems. Clipped
→