BriefGPT.xyz
Mar, 2024
无标度对抗强化学习
Scale-free Adversarial Reinforcement Learning
HTML
PDF
Mingyu Chen, Xuezhou Zhang
TL;DR
该研究探讨了马尔可夫决策过程中的无标度学习问题,提出了一个通用的算法框架(SCB),并在对抗性多臂赌博机和对抗性马尔可夫决策过程中应用该框架,从而实现了无标度对抗性多臂赌博机的首个鲁棒(最小化)期望遗憾上界和首个高概率遗憾上界,并产生了第一个具有$\tilde{\mathcal{O}}(\sqrt{T})$高概率遗憾保证的无标度强化学习算法。
Abstract
This paper initiates the study of
scale-free learning
in
markov decision processes
(MDPs), where the scale of rewards/losses is unknown to the learner. We design a generic algorithmic framework, \underline{S}cale
→