BriefGPT.xyz
May, 2023
连续强化学习的策略优化
Policy Optimization for Continuous Reinforcement Learning
HTML
PDF
Hanyang Zhao, Wenpin Tang, David D. Yao
TL;DR
研究了强化学习在连续时间和空间的设置下的应用,提出了购买力占据时间的概念,并进一步将其应用于策略梯度和TRPO/PPO方法中。通过数值实验,验证了此方法的有效性和优势。
Abstract
We study
reinforcement learning
(RL) in the setting of
continuous time and space
, for an infinite horizon with a
discounted objective
and
→