BriefGPT.xyz
Dec, 2019
强化学习中的参数化索引价值函数用于高效探索
Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning
HTML
PDF
Tian Tan, Zhihan Xiong, Vikranth R. Dwaracherla
TL;DR
本文提出了一种使用索引抽样来诱导探索的新方法,采用分布式时序差分算法学习参数化的索引值函数,并通过提出的双网络架构 Parameterized Indexed Networks(PIN)来表现出性能的优越性。
Abstract
It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient
exploration
in
reinforcement learning
. Ensemble sampling offers a relatively computationally tractable way of d
→