BriefGPT.xyz
Dec, 2023
强化学习中的乐观和悲观演员:拆分探索与利用
Optimistic and Pessimistic Actor in RL:Decoupling Exploration and Utilization
HTML
PDF
Jingpu Yang, Qirui Zhao, Helin Wang, Yuxiao Huang, Zirui Song...
TL;DR
优化和悲观主动者强化学习 (OPARL) 框架采用乐观和悲观两个角色的独特方法,在深度神经网络的泛化性能方面取得了显著提升。
Abstract
deep neural network
(DNN) generalization is limited by the over-reliance of current offline
reinforcement learning
techniques on conservative processing of existing datasets. This method frequently results in algo
→