BriefGPT.xyz
Apr, 2024
确定性环境下的递归反向 Q 学习
Recursive Backwards Q-Learning in Deterministic Environments
HTML
PDF
Jan Diekhoff, Jörn Fischer
TL;DR
该研究提出了递归反向 Q-learning(RBQL)代理,通过引入基于模型的方法,探索和构建环境模型,以更好地解决确定性问题。在达到终止状态后,该代理通过这个模型递归地向后传播其价值,从而实现对每个状态的最优值评估,避免了冗长的学习过程。在迷宫中寻找最短路径的示例中,该代理明显优于普通的 Q-learning 代理。
Abstract
reinforcement learning
is a popular method of finding optimal solutions to complex problems. Algorithms like
q-learning
excel at learning to solve stochastic problems without a model of their environment. However
→