BriefGPT.xyz
Nov, 2022
ARMOR: 一种基于模型的框架,用于利用离线数据改进任意基准策略
ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data
HTML
PDF
Tengyang Xie, Mohak Bhardwaj, Nan Jiang, Ching-An Cheng
TL;DR
提出了一种名为ARMOR的新型基于模型的离线RL框架,可在面对不确定性时优化最坏情况下的相对性能并学习在任何超参数下始终不降级基线策略的稳健策略改进,使其特别适用于建立实际学习系统。
Abstract
We propose a new model-based offline RL framework, called
adversarial models
for
offline reinforcement learning
(ARMOR), which can robustly learn policies to improve upon an arbitrary baseline policy regardless o
→