BriefGPT.xyz
Oct, 2020
针对非稳态MDPs的安全策略改进
Towards Safe Policy Improvement for Non-Stationary MDPs
HTML
PDF
Yash Chandak, Scott M. Jordan, Georgios Theocharous, Martha White, Philip S. Thomas
TL;DR
为了确保在具有高风险影响的平稳变化的非稳态决策问题上的安全性和高置信度,本文提出了一种方法,该方法通过模型自由强化学习与时间序列分析的综合,将一种称为Seldonian algorithm的安全算法扩展。
Abstract
Many real-world
sequential decision-making
problems involve critical systems with
financial risks
and
human-life risks
. While several work
→