BriefGPT.xyz
Oct, 2022
重新思考强化学习中的值函数学习以实现泛化
Rethinking Value Function Learning for Generalization in Reinforcement Learning
HTML
PDF
Seungyong Moon, JunYeong Lee, Hyun Oh Song
TL;DR
本研究旨在训练多个视觉环境下的RL代理以提高观察泛化性能,并提出了一种延迟评论者策略梯度(DCPG)算法,该算法可以使用单一统一的网络架构来实现,极大地提高了Procgen基准测试的样本效率和观测泛化性能。
Abstract
We focus on the problem of training
rl agents
on multiple training environments to improve
observational generalization
performance. In prior methods, policy and value networks are separately optimized using a di
→