Deep reinforcement learning methods exhibit impressive performance on a range
of tasks but still struggle on hard exploration tasks in large environments
with sparse rewards. To address this, intrinsic rewards can be generated using
forward model prediction errors that decrease as the environment becomes known,
and incentivize an agent to explore novel states. While prediction-based
intrinsic rewards can help agents solve hard exploration tasks, they can suffer
from catastrophic forgetting and actually increase at visited states. We first
examine the conditions and causes of catastrophic forgetting in grid world
environments. We then propose a new method FARCuriosity, inspired by how humans
and animals learn. The method depends on fragmentation and recall: an agent
fragments an environment based on surprisal, and uses different local curiosity
modules (prediction-based intrinsic reward functions) for each fragment so that
modules are not trained on the entire environment. At each fragmentation event,
the agent stores the current module in long-term memory (LTM) and either
initializes a new module or recalls a previously stored module based on its
match with the current state. With fragmentation and recall, FARCuriosity
achieves less forgetting and better overall performance in games with varied
and heterogeneous environments in the Atari benchmark suite of tasks. Thus,
this work highlights the problem of catastrophic forgetting in prediction-based
curiosity methods and proposes a solution.

在大型环境中，深度强化学习方法在多个任务上表现出色，但在具有稀疏奖励的困难探索任务上仍然面临困难。本研究发现基于预测的内在奖励方法可能出现灾难性遗忘，并提出了一种名为 FARCuriosity 的新方法，通过碎片化和回溯来减轻灾难性遗忘问题，提升了在具有不同环境的游戏中的性能表现。

神经启发的碎片化和回忆：解决好奇心中的灾难性遗忘

Neuro-Inspired Fragmentation and Recall to Overcome Catastrophic  Forgetting in Curiosity

This thesis studies the domain of collective robotics, and more particularly
the optimization problems of multirobot systems in the context of exploration,
path planning and coordination. It includes two contributions. The first one is
the use of the Butterfly Optimization Algorithm (BOA) to solve the Unknown Area
Exploration problem with energy constraints in dynamic environments. This
algorithm was never used for solving robotics problems before, as far as we
know. We proposed a new version of this algorithm called xBOA based on the
crossover operator to improve the diversity of the candidate solutions and
speed up the convergence of the algorithm. The second contribution is the
development of a new simulation framework for benchmarking dynamic incremental
problems in robotics such as exploration tasks. The framework is made in such a
manner to be generic to quickly compare different metaheuristics with minimum
modifications, and to adapt easily to single and multi-robot scenarios. Also,
it provides researchers with tools to automate their experiments and generate
visuals, which will allow them to focus on more important tasks such as
modeling new algorithms. We conducted a series of experiments that showed
promising results and allowed us to validate our approach and model.

本论文研究了集体机器人领域，特别是在探索、路径规划和协调等方面的多机器人系统的优化问题，并提出了一种基于蝴蝶优化算法和新的仿真框架的解决方案。

自主机器人群体行为优化贡献

Contribution à l'Optimisation d'un Comportement Collectif pour un  Groupe de Robots Autonomes

We propose a new method for learning from a single demonstration to solve
hard exploration tasks like the Atari game Montezuma's Revenge. Instead of
imitating human demonstrations, as proposed in other recent works, our approach
is to maximize rewards directly. Our agent is trained using off-the-shelf
reinforcement learning, but starts every episode by resetting to a state from a
demonstration. By starting from such demonstration states, the agent requires
much less exploration to learn a game compared to when it starts from the
beginning of the game at every episode. We analyze reinforcement learning for
tasks with sparse rewards in a simple toy environment, where we show that the
run-time of standard RL methods scales exponentially in the number of states
between rewards. Our method reduces this to quadratic scaling, opening up many
tasks that were previously infeasible. We then apply our method to Montezuma's
Revenge, for which we present a trained agent achieving a high-score of 74,500,
better than any previously published result.

提出了一种新的利用单一示范来学习解决 Montezuma's Revenge 等复杂探索任务的方法，该方法通过最大化奖励来训练代理，缩短了学习时间，降低了任务复杂度。