Recurrent neural networks (RNNs) notoriously struggle to learn long-term memories, primarily due to vanishing and exploding gradients. The recent success of state-space models (SSMs), a subclass of RNNs, to overcome such difficulties challenges our theoretical understanding. In this paper, we delve into the optimization challenges of RNNs and discover that, as the memory of a network increases, changes in its parameters result in increasingly large output variations, making gradient-based learning highly sensitive, even without exploding gradients. Our analysis further reveals the importance of the element-wise recurrence design pattern combined with careful parametrizations in mitigating this effect. This feature is present in SSMs, as well as in other architectures, such as LSTMs. Overall, our insights provide a new explanation for some of the difficulties in gradient-based learning of RNNs and why some architectures perform better than others.

递归神经网络在学习长期记忆方面存在困难，而最近基于状态空间模型的RNN取得的成功挑战了我们的理论理解。我们的分析揭示了元素逐个递归设计模式和谨慎参数设置在缓解这种效应方面的重要性，这一特征存在于状态空间模型及其他架构中。总体而言，我们的观点为解释RNN梯度学习中的一些困难以及为什么有些架构表现更好提供了新的解释。

循环神经网络：梯度消失和梯度爆炸并非问题的终点