BriefGPT.xyz
Oct, 2020
迭代式摊销策略优化
Iterative Amortized Policy Optimization
HTML
PDF
Joseph Marino, Alexandre Piché, Alessandro Davide Ialongo, Yisong Yue
TL;DR
该研究探讨利用策略网络进行连续控制的深度强化学习算法中的保险网络,并提出了迭代的摊销优化技术来提高性能。
Abstract
policy networks
are a central feature of
deep reinforcement learning
(RL) algorithms for
continuous control
, enabling the estimation and s
→