BriefGPT.xyz
Oct, 2022
MEET: 一种用于缓冲区采样的Monte Carlo 探索-利用权衡算法
MEET: A Monte Carlo Exploration-Exploitation Trade-off for Buffer Sampling
HTML
PDF
Julius Ott, Lorenzo Servadei, Jose Arjona-Medina, Enrico Rinaldi, Gianfranco Mauro...
TL;DR
本文提出了一种新的采样策略,基于Q值函数的不确定性估计,指导采样探索更重要的转移,从而学习到更有效的策略,实验表明,在各种环境下,该方法在收敛和峰值性能方面的表现平均超过现有策略26%。
Abstract
Data selection is essential for any data-based optimization technique, such as
reinforcement learning
. State-of-the-art
sampling strategies
for the
→