带线性函数逼近的正则化 Q 学习

Jan, 2024

Regularized Q-Learning with Linear Function Approximation

Jiachen Xi, Alfredo Garcia, Petar Momcilovic

TL;DR通过在有限时间内收敛到线性函数逼近情况下的投影贝尔曼误差的单环路算法，本文提出的算法在马尔科夫噪声存在的情况下收敛于稳定点，并为该算法衍生的策略提供性能保证。

Abstract

Several successful reinforcement learning algorithms make use of regularization to promote multi-modal policies that exhibit enhanced expl