BriefGPT.xyz
Aug, 2019
多目标强化学习和策略适应的广义算法
A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation
HTML
PDF
Runzhe Yang, Xingyuan Sun, Karthik Narasimhan
TL;DR
提出了一种基于广义Bellman方程的多目标强化学习算法,该算法可通过极少量的样本快速适应新任务并生成最优策略。
Abstract
We introduce a new algorithm for
multi-objective reinforcement learning
(MORL) with
linear preferences
, with the goal of enabling
few-shot adapta
→