一般强化学习的样本复杂度

Aug, 2013

The Sample-Complexity of General Reinforcement Learning

Tor Lattimore, Marcus Hutter, Peter Sunehag

TL;DR本文提出了一种新的泛化强化学习算法，适用于真实环境属于N个任意模型的情况下。该算法被证明在除O（N log^2 N）步骤之外的大部分情况下都是最优的，并考虑了无限的情况。同时研究表明，紧致性是决定存在统一样本复杂度界限的关键标准，并为有限情况给出匹配的下界。

Abstract

We present a new algorithm for general reinforcement learning where the true environment is known to belong to a finite class of N arbitrary mode