BriefGPT.xyz
Jan, 2024
关于样本高效的离线强化学习:数据多样性,后验采样和更多
On Sample-Efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling, and Beyond
HTML
PDF
Thanh Nguyen-Tang, Raman Arora
TL;DR
我们提出了一个新颖的基于后验采样的离线RL算法,该算法在样本效率方面表现出与基于版本空间和经验正则化的算法可比拟的性能,并且具有频率主义的亚优性界限。
Abstract
We seek to understand what facilitates
sample-efficient learning
from historical datasets for
sequential decision-making
, a problem that is popularly known as
→