BriefGPT.xyz
Nov, 2023
有时间限制的强化学习
Anytime-Constrained Reinforcement Learning
HTML
PDF
Jeremy McMahan, Xiaojin Zhu
TL;DR
我们引入并研究了具有任意时间限制的受限马尔可夫决策过程(cMDPs)。我们提出了一种固定参数可处理的方法,将具有任意时间限制的cMDPs转化为无约束的MDPs。我们设计出了适用于大表cMDPs的计划和学习算法,并设计了近似算法,可以高效地计算或学习一个近似可行策略。
Abstract
We introduce and study
constrained markov decision processes
(cMDPs) with
anytime constraints
. An anytime constraint requires the agent to never violate its budget at any point in time, almost surely. Although Ma
→