多目标强化学习和策略适应的广义算法

Aug, 2019

A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation

Runzhe Yang, Xingyuan Sun, Karthik Narasimhan

TL;DR提出了一种基于广义Bellman方程的多目标强化学习算法，该算法可通过极少量的样本快速适应新任务并生成最优策略。

Abstract

We introduce a new algorithm for multi-objective reinforcement learning (MORL) with linear preferences, with the goal of enabling few-shot adapta