Using deep neural nets as function approximator for reinforcement learning
tasks have recently been shown to be very powerful for solving problems
approaching real-world complexity. Using these results as a benchmark, we
discuss the role that the discount factor may play in the quality of the
learning process of a deep Q-network (DQN). When the discount factor
progressively increases up to its final value, we empirically show that it is
possible to significantly reduce the number of learning steps. When used in
conjunction with a varying learning rate, we empirically show that it
outperforms original DQN on several experiments. We relate this phenomenon with
the instabilities of neural networks when they are used in an approximate
Dynamic Programming setting. We also describe the possibility to fall within a
local optimum during the learning process, thus connecting our discussion with
the exploration/exploitation dilemma.

本文研究使用深度神经网络作为函数逼近器来解决逼近真实世界复杂度的强化学习问题。同时，我们探讨了折扣因子在深度 Q 网络（DQN）学习过程中所起的作用，实验结果表明在逐渐增加折扣因子值的情况下，可以显著降低 DQN 学习步骤的数量。当与变动的学习率一起使用时，其在多项实验中均优于原始 DQN，并将这一现象与神经网络在近似于动态规划设置中的不稳定性联系起来，同时描述了在学习过程中可能陷入局部最优解的可能性，从而将我们的讨论与探索 / 利用困境联系起来。