Modelling long-term dependencies is a challenge for recurrent neural networks. This is primarily due to the fact that gradients vanish during training, as the sequence length increases. Gradients can be attenuated by transition operators and are attenuated or dropped by activation functions. Canonical architectures like LSTM alleviate this issue by skipping information through a memory mechanism. We propose a new recurrent architecture (Non-saturating Recurrent Unit; NRU) that relies on a memory mechanism but forgoes both saturating activation functions and saturating gates, in order to further alleviate vanishing gradients. In a series of synthetic and real world tasks, we demonstrate that the proposed model is the only model that performs among the top 2 models across all tasks with and without long-term dependencies, when compared against a range of other architectures.

本文提出了一种新的递归神经网络架构NRU，该架构依赖于内存机制，不采用饱和激活函数和饱和门，以进一步减轻消失梯度问题，并在一系列合成和真实世界任务中证明了该模型是与其他架构相比，在具有和不具有长期依赖的所有任务中表现最佳的唯一模型。

面向建模长期依赖的非饱和循环单元