离线预训练加速探索和表示学习

Mar, 2023

Accelerating exploration and representation learning with offline pre-training

Bogdan Mazoure, Jake Bruce, Doina Precup, Rob Fergus, Ankit Anand

TL;DR从单个离线数据集中分别学习噪声对比估计的状态表示和辅助奖励模型，能够显着提高NetHack基准测试的样本效率，同时突出了我们实验设置的各种组成部分和关键洞察。

Abstract

Sequential decision-making agents struggle with long horizon tasks, since solving them requires multi-step reasoning. Most reinforcement learning (RL) algorithms address this challenge by improved credit assignment