BriefGPT.xyz
Dec, 2023
学习无行动行动
Learning to Act without Actions
HTML
PDF
Dominik Schmidt, Minqi Jiang
TL;DR
通过从无动作演示中推断潜在动作,我们介绍了一种名为LAPO的方法,它可以有效地预训练深度强化学习模型,并且可以快速微调以实现专家级表现。这为在网上大量的无动作演示中预训练强大而通用的强化学习模型提供了重要基础。
Abstract
pre-training
large models on vast amounts of
web data
has proven to be an effective approach for obtaining powerful, general models in several domains, including language and vision. However, this paradigm has no
→