BriefGPT.xyz
Apr, 2019
非平稳马尔可夫决策过程:基于模型的加强学习最坏情况方法,扩展版
Non-Stationary Markov Decision Processes a Worst-Case Approach using Model-Based Reinforcement Learning
HTML
PDF
Erwan Lecarpentier, Emmanuel Rachelson
TL;DR
本研究旨在解决在非恒定随机环境下的鲁棒零-shot规划问题,通过引入定义了特定类别的马尔可夫决策过程来进行计算建模,并提出了一种零-shot基于模型的风险敏感树搜索算法。
Abstract
This work tackles the problem of
robust zero-shot planning
in
non-stationary stochastic environments
. We study
markov decision processes
(
→