One of the notorious issues for Reinforcement Learning (RL) is poor sample
efficiency. Compared to single agent RL, the sample efficiency for Multi-Agent
Reinforcement Learning (MARL) is more challenging because of its inherent
partial observability, non-stationary training, and enormous strategy space.
Although much effort has been devoted to developing new methods and enhancing
sample efficiency, we look at the widely used episodic training mechanism. In
each training step, tens of frames are collected, but only one gradient step is
made. We argue that this episodic training could be a source of poor sample
efficiency. To better exploit the data already collected, we propose to
increase the frequency of the gradient updates per environment interaction
(a.k.a. Replay Ratio or Update-To-Data ratio). To show its generality, we
evaluate $3$ MARL methods on $6$ SMAC tasks. The empirical results validate
that a higher replay ratio significantly improves the sample efficiency for
MARL algorithms. The codes to reimplement the results presented in this paper
are open-sourced at this https URL

增加重播比例（或更新至数据比例）可以显著提高多智能体强化学习算法的样本效率。

高回放率赋予样本高效的多智能体强化学习

Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement  Learning

Plasticity, the ability of a neural network to evolve with new data, is
crucial for high-performance and sample-efficient visual reinforcement learning
(VRL). Although methods like resetting and regularization can potentially
mitigate plasticity loss, the influences of various components within the VRL
framework on the agent's plasticity are still poorly understood. In this work,
we conduct a systematic empirical exploration focusing on three primary
underexplored facets and derive the following insightful conclusions: (1) data
augmentation is essential in maintaining plasticity; (2) the critic's
plasticity loss serves as the principal bottleneck impeding efficient training;
and (3) without timely intervention to recover critic's plasticity in the early
stages, its loss becomes catastrophic. These insights suggest a novel strategy
to address the high replay ratio (RR) dilemma, where exacerbated plasticity
loss hinders the potential improvements of sample efficiency brought by
increased reuse frequency. Rather than setting a static RR for the entire
training process, we propose Adaptive RR, which dynamically adjusts the RR
based on the critic's plasticity level. Extensive evaluations indicate that
Adaptive RR not only avoids catastrophic plasticity loss in the early stages
but also benefits from more frequent reuse in later phases, resulting in
superior sample efficiency.

基于神经网络的高性能、高效样本视觉增强强化学习的主要研究领域之一是塑性。本研究通过系统性实证研究揭示了数据增强、评论者的塑性损失、塑性恢复等关键组成部分对塑性的影响，并提出了一种基于评论者塑性水平动态调整回放率来解决高回放率困境的策略，该策略在早期避免了塑性损失，并在后期重用更频繁的情况下提高样本效率。

重访视觉强化学习中的可塑性：数据、模块和训练阶段

Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules  and Training Stages

Experience replay is central to off-policy algorithms in deep reinforcement
learning (RL), but there remain significant gaps in our understanding. We
therefore present a systematic and extensive analysis of experience replay in
Q-learning methods, focusing on two fundamental properties: the replay capacity
and the ratio of learning updates to experience collected (replay ratio). Our
additive and ablative studies upend conventional wisdom around experience
replay -- greater capacity is found to substantially increase the performance
of certain algorithms, while leaving others unaffected. Counterintuitively we
show that theoretically ungrounded, uncorrected n-step returns are uniquely
beneficial while other techniques confer limited benefit for sifting through
larger memory. Separately, by directly controlling the replay ratio we
contextualize previous observations in the literature and empirically measure
its importance across a variety of deep RL algorithms. Finally, we conclude by
testing a set of hypotheses on the nature of these performance benefits.

本文通过系统的分析和研究体验回放在 Q-learning 方法中的两个基本性质：回放容量和学习更新与经验收集的比率（回放比），颠覆了关于经验回放的传统认识。同时，本文也测量了控制回放比的重要性，并对表现优秀的算法进行了一系列的测试。