Mastering multiple tasks through exploration and learning in an environment
poses a significant challenge in reinforcement learning (RL). Unsupervised RL
has been introduced to address this challenge by training policies with
intrinsic rewards rather than extrinsic rewards. However, current intrinsic
reward designs and unsupervised RL algorithms often overlook the heterogeneous
nature of collected samples, thereby diminishing their sample efficiency. To
overcome this limitation, in this paper, we propose a reward-free RL algorithm
called \alg. The key idea behind our algorithm is an uncertainty-aware
intrinsic reward for exploring the environment and an uncertainty-weighted
learning process to handle heterogeneous uncertainty in different samples.
Theoretically, we show that in order to find an $\epsilon$-optimal policy,
GFA-RFE needs to collect $\tilde{O} (H^2 \log N_{\mathcal F} (\epsilon)
\mathrm{dim} (\mathcal F) / \epsilon^2 )$ number of episodes, where $\mathcal
F$ is the value function class with covering number $N_{\mathcal F} (\epsilon)$
and generalized eluder dimension $\mathrm{dim} (\mathcal F)$. Such a result
outperforms all existing reward-free RL algorithms. We further implement and
evaluate GFA-RFE across various domains and tasks in the DeepMind Control
Suite. Experiment results show that GFA-RFE outperforms or is comparable to the
performance of state-of-the-art unsupervised RL algorithms.

通过探索和学习在环境中掌握多个任务是强化学习中一个重要的挑战。本文介绍了一种无需奖励的强化学习算法，其中的关键思想是通过不确定性感知的内在奖励来探索环境，并通过不同样本的不确定性加权学习处理异质性不确定性，通过在 DeepMind Control Suite 的各个领域和任务上的实验结果表明，该算法优于或与现有的无监督强化学习算法的性能相当。