Vanishing long-term gradients are a major issue in training standard
recurrent neural networks (RNNs), which can be alleviated by long short-term
memory (LSTM) models with memory cells. However, the extra parameters
associated with the memory cells mean an LSTM layer has four times as