BriefGPT.xyz
Jan, 2022
强化学习用于带有动作约束的任务规定
Reinforcement Learning for Task Specifications with Action-Constraints
HTML
PDF
Arun Raman, Keerthan Shagrithaya, Shalabh Bhatnagar
TL;DR
本文运用离散事件系统监控控制理论的概念,提出一种方法用于在有限状态的马尔可夫决策过程中,学习最优控制策略,并利用奖励机器的发展来处理状态限制。通过给定一个例子来阐明其应用性并在此设置中展示了仿真结果。
Abstract
In this paper, we use concepts from
supervisory control theory
of discrete event systems to propose a method to learn optimal control policies for a finite-state
markov decision process
(MDP) in which (only) cert
→