基于核的强化学习：有限时间分析

Apr, 2020

Regret Bounds for Kernel-Based Reinforcement Learning

Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Emilie Kaufmann, Michal Valko

TL;DR本文提出了一种基于核变量的乐观算法Kernel-UCBVI，以及使用平滑核估计MDP奖励和转移的方法，以在探索和开发之间有效平衡，从而解决了有限时间内强化学习中的探索与开发困境。在连续MDP应用中，本文通过实验验证了该方法。

Abstract

We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a met