Artificial neural networks are promising as general function approximators but challenging to train on non-independent and identically distributed data due to catastrophic forgetting. Experience replay, a standard component in deep reinforcement learning, is often used to reduce forgetting and improve sample efficiency by storing experiences in a large buffer and using them for training later. However, a large replay buffer results in a heavy memory burden, especially for onboard and edge devices with limited memory capacities. We propose memory-efficient reinforcement learning algorithms based on the deep Q-network algorithm to alleviate this problem. Our algorithms reduce forgetting and maintain high sample efficiency by consolidating knowledge from the target Q-network to the current Q-network. Compared to baseline methods, our algorithms achieve comparable or better performance on both feature-based and image-based tasks while easing the burden of large experience replay buffers.

我们提出了一种基于深度 Q 网络算法的记忆效率强化学习算法，通过从目标 Q 网络到当前 Q 网络合并知识，减少遗忘并保持高的样本效率。与基线方法相比，在特征和图像任务中取得了相当或更好的性能，同时减轻了大经验重放缓冲区的负担。

具备知识整合的记忆高效强化学习