BriefGPT.xyz
Dec, 2022
基于模型的强化学习与多项式逻辑函数逼近
Model-Based Reinforcement Learning with Multinomial Logistic Function Approximation
HTML
PDF
Taehyun Hwang, Min-hwan Oh
TL;DR
通过上界置信度算法,为状态转换由多项式逻辑模型给出的MDP建立可证明的高效强化学习算法,其信息瓶颈受到未知转换核的限制。实验表明该算法在实践中具有卓越的性能表现.
Abstract
We study
model-based reinforcement learning
(RL) for episodic
markov decision processes
(MDP) whose transition probability is parametrized by an unknown transition core with features of state and action. Despite
→