BriefGPT.xyz
Ask
alpha
关键词
factored mdp
搜索结果 - 1
分解马尔可夫决策过程中近最优强化学习
通过采用 posterior sampling reinforcement learning (PSRL) 算法和 upper confidence bound algorithm (UCRL-Factored) 算法,在已知为 facto
→
PDF
10 years ago
Prev
Next