BriefGPT.xyz
Mar, 2023
离线预训练加速探索和表示学习
Accelerating exploration and representation learning with offline pre-training
HTML
PDF
Bogdan Mazoure, Jake Bruce, Doina Precup, Rob Fergus, Ankit Anand
TL;DR
从单个离线数据集中分别学习噪声对比估计的状态表示和辅助奖励模型,能够显着提高NetHack基准测试的样本效率,同时突出了我们实验设置的各种组成部分和关键洞察。
Abstract
Sequential decision-making agents struggle with long horizon tasks, since solving them requires multi-step reasoning. Most
reinforcement learning
(RL) algorithms address this challenge by improved
credit assignment
→