TL;DR本文提出了一种针对不同规模的RDDL MDP问题进行神经网络迁移学习的方法,其关键创新包括状态编码器和参数绑定的动作解码器,该方法在SysAdmin和Game Of Life领域具有卓越的性能表现。
Abstract
neural planners for rddl mdps produce deep reactive policies in an offline fashion. These scale well with large domains, but are sample inefficient and time-consuming to train from scratch for each new problem. T