通过随机化价值函数实现泛化和探索

Feb, 2014

通过随机化价值函数实现泛化和探索

Generalization and Exploration via Randomized Value Functions

Benjamin Van Roy, Zheng Wen

TL;DR本文提出了一种新的RL算法RLSVI，针对线性参数化的价值函数进行探索和泛化，相较于Boltzmann或epsilon-greedy探索，RLSVI实现了显著的效率提高，并在tabula rasa的学习环境下展现出接近最优的表现，研究表明随机化的价值函数是解决增强学习中有效探索和泛化的关键所在。

Abstract

We consider the problem of reinforcement learning with an orientation toward contexts in which an agent must generalize from past experience and explore to reduce uncertainty. We propose an approach to exploration based on →