针对非稳态MDPs的安全策略改进

Oct, 2020

Towards Safe Policy Improvement for Non-Stationary MDPs

Yash Chandak, Scott M. Jordan, Georgios Theocharous, Martha White, Philip S. Thomas

TL;DR为了确保在具有高风险影响的平稳变化的非稳态决策问题上的安全性和高置信度，本文提出了一种方法，该方法通过模型自由强化学习与时间序列分析的综合，将一种称为Seldonian algorithm的安全算法扩展。

Abstract

Many real-world sequential decision-making problems involve critical systems with financial risks and human-life risks. While several work