This work tackles the problem of robust zero-shot planning in non-stationary stochastic environments. We study Markov Decision Processes (MDPs) evolving over time and consider Model-Based Reinforcement Learning algorithms in this setting. We make two hypotheses: 1) the environment evolves continuously and its evolution rate is bounded, 2) a current model is known at each decision epoch but not its evolution. Our contribution can be presented in four points. First, we define this specific class of MDPs that we call Non-Stationary MDPs (NSMDPs). We introduce the notion of regular evolution by making an hypothesis of Lipschitz-Continuity on the transition and reward functions w.r.t. time. Secondly, we consider a planning agent using the current model of the environment, but unaware of its future evolution. This leads us to consider a worst-case method where the environment is seen as an adversarial agent. Third, following this approach, we propose the Risk-Averse Tree-Search (RATS) algorithm. This is a zero-shot Model-Based method similar to Minimax search. Finally, we illustrate the benefits brought by RATS empirically and compare its performance with reference Model-Based algorithms.

本研究旨在解决在非恒定随机环境下的鲁棒零-shot规划问题，通过引入定义了特定类别的马尔可夫决策过程来进行计算建模，并提出了一种零-shot基于模型的风险敏感树搜索算法。

非平稳马尔可夫决策过程：基于模型的加强学习最坏情况方法，扩展版