This paper demonstrates a novel method for separately estimating aleatoric risk and epistemic uncertainty in deep reinforcement learning. Aleatoric risk, which arises from inherently stochastic environments or agents, must be accounted for in the design of risk-sensitive algorithms. Epistemic uncertainty, which stems from limited data, is important both for risk-sensitivity and to efficiently explore an environment. We first present a Bayesian framework for learning the return distribution in reinforcement learning, which provides theoretical foundations for quantifying both types of uncertainty. Based on this framework, we show that the disagreement between only two neural networks is sufficient to produce a low-variance estimate of the epistemic uncertainty on the return distribution, thus providing a simple and computationally cheap uncertainty metric. We demonstrate experiments that illustrate our method and some applications.

提出了一个框架，通过学习的 Q 值来区分和估计强化学习中源于有限数据的认识不确定性和源于随机环境的aleatoric不确定性，并引入一种考虑不确定性的 DQN 算法，该算法表现出安全的学习行为，并在 MinAtar 测试中表现出优越性能。

深度强化学习中的风险和不确定性估计