双不确定值网络实现的高效探索

Nov, 2017

Efficient exploration with Double Uncertain Value Networks

Thomas M. Moerland, Joost Broekens, Catholijn M. Jonker

TL;DR通过跟踪每个可用行动价值的不确定性来研究针对强化学习智能体的定向探索，通过贝叶斯丢弃估计参数不确定性，通过钟形曲线的高斯分布传播来估计回报不确定性，并使用学习的分布直接推导策略。

Abstract

This paper studies directed exploration for reinforcement learning agents by tracking uncertainty about the value of each available action