BriefGPT.xyz
Nov, 2021
深度强化学习的自适应校准评论家评估
Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning
HTML
PDF
Nicolai Dorka, Joschka Boedecker, Wolfram Burgard
TL;DR
提出了一种称为适应性校准评论家 (ACC) 的方法,并将其应用于Truncated Quantile Critics中,实现了自适应调整参数,从而消除了低方差时间差分目标的偏差,并在OpenAI gym连续控制基准测试中取得了新的最佳成绩。
Abstract
Accurate value estimates are important for
off-policy reinforcement learning
. Algorithms based on
temporal difference learning
typically are prone to an over- or underestimation bias building up over time. In thi
→