BriefGPT.xyz
Oct, 2023
强化学习中LSTD和随机特征的双下降
On Double-Descent in Reinforcement Learning with LSTD and Random Features
HTML
PDF
David Brellmann, Eloïse Berthier, David Filliat, Goran Frehse
TL;DR
研究论文通过理论分析和数值实验,研究了深度强化学习中时间差分算法的表现受神经网络规模和$l_2$-正则化的影响,发现参数和状态的比例是一个关键因素,还观察到双谷现象,即当参数/状态比例为1时性能会突然下降。
Abstract
Temporal Difference (TD) algorithms are widely used in
deep reinforcement learning
(RL). Their performance is heavily influenced by the size of the
neural network
. While in supervised learning, the regime of over
→