一种方差最小化的时间差学习方法

Nov, 2024

一种方差最小化的时间差学习方法

A Variance Minimization Approach to Temporal-Difference Learning

Xingguo Chen, Yu Gong, Shangdong Yang, Wenhao Wang

TL;DR该研究解决了快速收敛算法在强化学习中的需求，特别是在传统价值基础算法中存在的收敛速度问题。本文提出了一种方差最小化的方法，提出了贝尔曼误差方差（VBE）和投影贝尔曼误差方差（VPBE）两个目标，并基于此发展了多种算法。实验证明了这些算法的有效性和收敛性，展示了其在优化政策上的优势。

Abstract

Fast-converging algorithms are a contemporary requirement in Reinforcement Learning. In the context of linear function approximation, the magnitude of the smallest eigenvalue of the key matrix is a major factor reflecting the →