通过认识价值估计的探索

Mar, 2023

Exploration via Epistemic Value Estimation

Simon Schmitt, John Shawe-Taylor, Hado van Hasselt

TL;DR本文提出了一种称为epistemic value estimation (EVE)的方法，用于有效探索在强化学习中的问题，EVE方法适用于序列决策以及神经网络函数逼近器，利用其可计算的参数的后验概率，能够有效地计算出epistemic value uncertainty这一不确定性，经实验验证EVE方法有助于在困难的探索任务中实现有效的探索。

Abstract

How to efficiently explore in reinforcement learning is an open problem. Many exploration algorithms employ the epistemic uncertainty of t