BriefGPT.xyz
Dec, 2016
基于偏好的马尔可夫决策过程中的分位数优化
Optimizing Quantiles in Preference-based Markov Decision Processes
HTML
PDF
Hugo Gilbert, Paul Weng, Yan Xu
TL;DR
本文提出了一种基于分位数准则计算最优策略的算法,并在随机马尔科夫决策过程和数据中心控制问题上进行了实验评估。
Abstract
In the
markov decision process
model, policies are usually evaluated by expected cumulative rewards. As this decision criterion is not always suitable, we propose in this paper an algorithm for computing a policy optimal for the
→