BriefGPT.xyz
Feb, 2023
基于遗憾的优化方法用于强化学习的鲁棒性
Robust Deep Reinforcement Learning through Regret Neighborhoods
HTML
PDF
Roman Belaire, Pradeep Varakantham, David Lo
TL;DR
该论文提出一种更为积极的方法改进深度强化学习中的强健性,采用最小化最大后悔作为优化方法,并证明该方法可显著提高性能。
Abstract
deep reinforcement learning
(DRL) policies have been shown to be vulnerable to small
adversarial noise
in observations. Such
adversarial noise
→