消失偏差启发式引导的强化学习算法

Jun, 2023

消失偏差启发式引导的强化学习算法

Vanishing Bias Heuristic-guided Reinforcement Learning Algorithm

Qinru Li, Hao Xiang

TL;DR本研究在月球着陆环境中研究了强化学习的经典方法及基于神经网络的方法，并提出了一种名为Heuristic RL的新算法，通过引入启发式技巧来指导早期阶段的训练，同时减轻人为偏见的影响。实验结果表明，我们提出的方法在月球着陆环境中表现出良好的效果。

Abstract

reinforcement learning has achieved tremendous success in the many Atari games. In this paper we explored with the lunar lander environment and implemented classical methods including →