Averaged-DQN: 深度强化学习的方差减少和稳定性提高

Nov, 2016

Averaged-DQN: 深度强化学习的方差减少和稳定性提高

Deep Reinforcement Learning with Averaged Target DQN

Oron Anschel, Nir Baram, Nahum Shimkin

TL;DR提出了一种基于Q值平均的深度强化学习算法扩展Averaged-DQN，通过减少目标值中的近似误差方差，改善DQN算法的不稳定性和变异性，实验结果表明可以提高Arcade Learning Environment测试集的稳定性和性能。

Abstract

The commonly used Q-learning algorithm combined with function approximation induces systematic overestimations of state-action values. These systematic errors might cause instability, poor performance and sometimes divergence of learning. In this work, we present the \textsc{Averaged Target DQN} (ADQN) algorithm, an adaptation to the DQN class of algorithms