BriefGPT.xyz
Feb, 2014
通过随机化价值函数实现泛化和探索
Generalization and Exploration via Randomized Value Functions
HTML
PDF
Benjamin Van Roy, Zheng Wen
TL;DR
本文提出了一种新的RL算法RLSVI,针对线性参数化的价值函数进行探索和泛化,相较于Boltzmann或epsilon-greedy探索,RLSVI实现了显著的效率提高,并在tabula rasa的学习环境下展现出接近最优的表现,研究表明随机化的价值函数是解决增强学习中有效探索和泛化的关键所在。
Abstract
We consider the problem of
reinforcement learning
with an orientation toward contexts in which an agent must generalize from past experience and explore to reduce uncertainty. We propose an approach to exploration based on
→