BriefGPT.xyz
Sep, 2022
强化学习中的探索问题:基于情节访问差异性的奖励机制
Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning
HTML
PDF
Mingqi Yuan, Bo Li, Xin Jin, Wenjun Zeng
TL;DR
该研究提出一种针对高维度观察和稀疏奖励环境的计算高效和数量化探索方法——基于奖励的情节访问差异度(REVD)。研究表明,REVD可以显著提高增强学习算法的样本效率并优于基准方法。
Abstract
exploration
is critical for
deep reinforcement learning
in complex environments with high-dimensional observations and sparse rewards. To address this problem, recent approaches proposed to leverage
→