BriefGPT.xyz
Sep, 2023
连续控制中的政策优化问题:噪声邻域下的回报景观
Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control
HTML
PDF
Nate Rahn, Pierluca D'Oro, Harley Wiltzer, Pierre-Luc Bacon, Marc G. Bellemare
TL;DR
通过研究回报景观,我们对连续控制的深度强化学习代理的不稳定性行为提供了新的视角,并揭示了一维度的策略质量,最终我们开发了一个分布感知的程序以提高策略的鲁棒性。
Abstract
deep reinforcement learning
agents for
continuous control
are known to exhibit significant instability in their performance over time. In this work, we provide a fresh perspective on these behaviors by studying t
→