BriefGPT.xyz
Oct, 2021
具有样本重用的广义近端策略优化
Generalized Proximal Policy Optimization with Sample Reuse
HTML
PDF
James Queeney, Ioannis Ch. Paschalidis, Christos G. Cassandras
TL;DR
研究利用理论达到策略提升保证的同时,结合较高的数据效率进行决策,通过广义的近端优化,基于样本的有效复用,实现了稳定性和样本效率之间的有效平衡,从而在表现上有了提高。
Abstract
In real-world decision making tasks, it is critical for data-driven
reinforcement learning
methods to be both stable and sample efficient.
on-policy
methods typically generate reliable policy improvement througho
→