基于偏好的马尔可夫决策过程中的分位数优化

Dec, 2016

基于偏好的马尔可夫决策过程中的分位数优化

Optimizing Quantiles in Preference-based Markov Decision Processes

Hugo Gilbert, Paul Weng, Yan Xu

TL;DR本文提出了一种基于分位数准则计算最优策略的算法，并在随机马尔科夫决策过程和数据中心控制问题上进行了实验评估。

Abstract

In the markov decision process model, policies are usually evaluated by expected cumulative rewards. As this decision criterion is not always suitable, we propose in this paper an algorithm for computing a policy optimal for the →