Bayesian methods for machine learning have been widely investigated, yielding principled methods for incorporating prior information into inference algorithms. In this survey, we provide an in-depth review of the role of Bayesian methods for the reinforcement learning (RL) paradigm. The major incentives for incorporating Bayesian reasoning in RL are: 1) it provides an elegant approach to action-selection (exploration/exploitation) as a function of the uncertainty in learning; and 2) it provides a machinery to incorporate prior knowledge into the algorithms. We first discuss models and methods for Bayesian inference in the simple single-step Bandit model. We then review the extensive recent literature on Bayesian methods for model-based RL, where prior information can be expressed on the parameters of the Markov model. We also present Bayesian methods for model-free RL, where priors are expressed over the value function or policy class. The objective of the paper is to provide a comprehensive survey on Bayesian RL algorithms and their theoretical and empirical properties.

本文深入探讨贝叶斯方法在强化学习中的作用，讨论了使用贝叶斯推理进行动作选择和利用先验知识等方面的优点，概述了在单步赌博机模型、模型基 RL 和模型无 RL 中贝叶斯方法的模型与方法，并全面评估了贝叶斯 RL 算法及其理论和实证性质。

贝叶斯强化学习：一项调查