BriefGPT.xyz
Feb, 2023
通过知识危险寻优策略优化实现高效探索
Efficient exploration via epistemic-risk-seeking policy optimization
HTML
PDF
Brendan O'Donoghue
TL;DR
提出了一种基于期望风险的探索算法,通过训练神经网络和优化策略使智能体具有探索未知状态的能力,在深度强化学习中表现出良好的性能。
Abstract
exploration
remains a key challenge in
deep reinforcement learning
(RL).
optimism
in the face of uncertainty is a well-known heuristic wit
→