Several variants of the Long Short-Term Memory (LSTM) architecture for recurrent neural networks have been proposed since its inception in 1995. In recent years, these networks have become the state-of-the-art models for a variety of machine learning problems. This has led to a renewed interest in understanding the role and utility of various computational components of typical LSTM variants. In this paper, we present the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling. The hyperparameters of all LSTM variants for each task were optimized separately using random search and their importance was assessed using the powerful fANOVA framework. In total, we summarize the results of 5400 experimental runs (about 15 years of CPU time), which makes our study the largest of its kind on LSTM networks. Our results show that none of the variants can improve upon the standard LSTM architecture significantly, and demonstrate the forget gate and the output activation function to be its most critical components. We further observe that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.

本文针对典型 Long Short-Term Memory（LSTM）网络的各种计算组件的角色和实用性进行了大规模分析，并在三个代表性任务上比较了八种 LSTM 变体。结果表明，遗忘门和输出激活函数是LSTM架构最关键的组件。此外，在研究的超参数方面，我们发现它们几乎是独立的，并提出了调整它们的有效指导方针。

LSTM: 一次搜索空间的奥德赛