确定性环境下的递归反向 Q 学习

Apr, 2024

Recursive Backwards Q-Learning in Deterministic Environments

Jan Diekhoff, Jörn Fischer

TL;DR该研究提出了递归反向 Q-learning（RBQL）代理，通过引入基于模型的方法，探索和构建环境模型，以更好地解决确定性问题。在达到终止状态后，该代理通过这个模型递归地向后传播其价值，从而实现对每个状态的最优值评估，避免了冗长的学习过程。在迷宫中寻找最短路径的示例中，该代理明显优于普通的 Q-learning 代理。

Abstract

reinforcement learning is a popular method of finding optimal solutions to complex problems. Algorithms like q-learning excel at learning to solve stochastic problems without a model of their environment. However