In this work, we investigate the means of using curiosity on replay buffers
to improve offline multi-task continual reinforcement learning when tasks,
which are defined by the non-stationarity in the environment, are non labeled
and not evenly exposed to the learner in time. In particular, we investigate
the use of curiosity both as a tool for task boundary detection and as a
priority metric when it comes to retaining old transition tuples, which we
respectively use to propose two different buffers. Firstly, we propose a Hybrid
Reservoir Buffer with Task Separation (HRBTS), where curiosity is used to
detect task boundaries that are not known due to the task agnostic nature of
the problem. Secondly, by using curiosity as a priority metric when it comes to
retaining old transition tuples, a Hybrid Curious Buffer (HCB) is proposed. We
ultimately show that these buffers, in conjunction with regular reinforcement
learning algorithms, can be used to alleviate the catastrophic forgetting issue
suffered by the state of the art on replay buffers when the agent's exposure to
tasks is not equal along time. We evaluate catastrophic forgetting and the
efficiency of our proposed buffers against the latest works such as the Hybrid
Reservoir Buffer (HRB) and the Multi-Time Scale Replay Buffer (MTR) in three
different continual reinforcement learning settings. Experiments were done on
classical control tasks and Metaworld environment. Experiments show that our
proposed replay buffers display better immunity to catastrophic forgetting
compared to existing works in most of the settings.

研究通过使用好奇心重播缓冲区的方法，改进离线多任务连续强化学习，当任务由环境中的非稳定性定义时，这些任务在时间上不是标记的且不均匀地展示给学习者。

利用好奇心在连续离线强化学习中实现任务均衡表示

Using Curiosity for an Even Representation of Tasks in Continual Offline  Reinforcement Learning

Self-supervised learning (SSL) aims to eliminate one of the major bottlenecks
in representation learning - the need for human annotations. As a result, SSL
holds the promise to learn representations from data in-the-wild, i.e., without
the need for finite and static datasets. Instead, true SSL algorithms should be
able to exploit the continuous stream of data being generated on the internet
or by agents exploring their environments. But do traditional self-supervised
learning approaches work in this setup? In this work, we investigate this
question by conducting experiments on the continuous self-supervised learning
problem. While learning in the wild, we expect to see a continuous (infinite)
non-IID data stream that follows a non-stationary distribution of visual
concepts. The goal is to learn a representation that can be robust, adaptive
yet not forgetful of concepts seen in the past. We show that a direct
application of current methods to such continuous setup is 1) inefficient both
computationally and in the amount of data required, 2) leads to inferior
representations due to temporal correlations (non-IID data) in some sources of
streaming data and 3) exhibits signs of catastrophic forgetting when trained on
sources with non-stationary data distributions. We propose the use of replay
buffers as an approach to alleviate the issues of inefficiency and temporal
correlations. We further propose a novel method to enhance the replay buffer by
maintaining the least redundant samples. Minimum redundancy (MinRed) buffers
allow us to learn effective representations even in the most challenging
streaming scenarios composed of sequential visual data obtained from a single
embodied agent, and alleviates the problem of catastrophic forgetting when
learning from data with non-stationary semantic distributions.

本文研究了自我监督学习在连续流数据中的应用及其效率，提出了重放缓存与最小冗余样本的方法来增强学习。实验结果表明，这些方法可以有效提高表示学习的精度和鲁棒性，在非平稳的语义分布下也不容易出现灾难性遗忘。