Klaus Greff, Rupesh Kumar Srivastava, Jan Koutník, Bas R. Steunebrink, Jürgen Schmidhuber
TL;DR本文针对典型 Long Short-Term Memory(LSTM)网络的各种计算组件的角色和实用性进行了大规模分析,并在三个代表性任务上比较了八种 LSTM 变体。结果表明,遗忘门和输出激活函数是LSTM架构最关键的组件。此外,在研究的超参数方面,我们发现它们几乎是独立的,并提出了调整它们的有效指导方针。
Abstract
Several variants of the long short-term memory (LSTM) architecture for recurrent neural networks have been proposed since its inception in 1995. In recent years, these networks have become the state-of-the-art models for a variety of →