通过贝叶斯深度Q网络实现高效探索

Feb, 2018

通过贝叶斯深度Q网络实现高效探索

Efficient Exploration through Bayesian Deep Q-Networks

Kamyar Azizzadenesheli, Emma Brunskill, Animashree Anandkumar

TL;DR这篇论文研究了高维情境下的强化学习，提出了两种基于乐观法和后验采样的算法来解决此问题，并扩展了该方法应用在深度强化学习上，所提出的贝叶斯深度Q网络通过采用贝叶斯线性回归的方法调整Q-networks的学习方式，使其能够充分平衡探索与执行间的权衡，更加有效地应用在Atari游戏中。

Abstract

We propose Bayesian Deep Q-Network (BDQN), a practical Thompson sampling based reinforcement learning (RL) Algorithm. Thompson sampling allows for targeted exploration in high dimensions through posterior sampling but is usually computationally expensive. We address this limitation by