BriefGPT.xyz
Aug, 2022
强化学习中基于时间不一致性的自监督探索
Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning
HTML
PDF
Zijian Gao, Kele Xu, HengXing Cai, Yuanzhao Zhai, Dawei Feng...
TL;DR
本文提出一种新的内在奖励方法,利用自监督预测模型和核范数来评估历史知识对当前观察的差异,以此解决稀疏奖励的强化学习问题,并在多个基准环境下展示其优越性。
Abstract
In real-world scenarios,
reinforcement learning
under
sparse-reward
synergistic settings has remained challenging, despite surging interests in this field. Previous attempts suggest that
→