BriefGPT.xyz
Oct, 2019
无需向前预测的学习:没有前向预测的世界模型
Learning to Predict Without Looking Ahead: World Models Without Forward Prediction
HTML
PDF
C. Daniel Freeman, Luke Metz, David Ha
TL;DR
本研究介绍了一种名为'observational dropout'的改进强化学习方法,该方法通过限制智能体在每个时间步的真实环境观察能力,强制智能体学习填补观察不足的世界模型,结果表明基于这种模型的强化学习算法可以提高智能体的学习效率和执行能力。
Abstract
Much of
model-based reinforcement learning
involves learning a model of an
agent
's world, and training an
agent
to leverage this model to
→