有限时间内基于线性函数逼近的时序差分学习分析

Jun, 2018

有限时间内基于线性函数逼近的时序差分学习分析

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

Jalaj Bhandari, Daniel Russo, Raghav Singal

TL;DR本文提供了关于具有线性函数逼近的时间差异学习的简单而明确的有限时间分析，研究它在强化学习中的适用性，分析结果适用于TD（λ）学习和应用于高维度最佳停止问题的Q-learning。

Abstract

temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a markov decision process. Although TD is one of the most widely used algor