BriefGPT.xyz
Nov, 2019
从增强学习到无悔在线学习的降低
A Reduction from Reinforcement Learning to No-Regret Online Learning
HTML
PDF
Ching-An Cheng, Remi Tachet des Combes, Byron Boots, Geoff Gordon
TL;DR
提出了一种基于鞍点形式的强化学习到无悔在线学习的缩减方法,将强化学习问题分解成了遗憾最小化和函数逼近两个部分,并指出了这一缩减方法的重要性
Abstract
We present a reduction from
reinforcement learning
(RL) to no-regret
online learning
based on the saddle-point formulation of RL, by which "any" online algorithm with sublinear regret can generate policies with p
→