Kazuki Irie, Anand Gopalakrishnan, Jürgen Schmidhuber
TL;DR本文研究了将实时递归学习和策略梯度相结合的演员-评论员方法在DMLab、ProcGen和Atari-2600环境中的应用,结果表明,在DMLab记忆任务中,我们的系统相比于训练了10 B 帧的IMPALA和R2D2基线的系统,只需训练不到1.2 B个环境帧就能够达到很好的性能表现。
Abstract
real-time recurrent learning (RTRL) for sequence-processing recurrent neural networks (RNNs) offers certain conceptual advantages over backpropagation through time (BPTT). RTRL requires neither caching past activ