After a more than decade-long period of relatively little research activity in the area of recurrent neural networks, several new developments will be reviewed here that have allowed substantial progress both in understanding and in technical solutions towards more efficient training of recurrent networks. These advances have been motivated by and related to the optimization issues surrounding deep learning. Although recurrent networks are extremely powerful in what they can in principle represent in terms of modelling sequences,their training is plagued by two aspects of the same issue regarding the learning of long-term dependencies. Experiments reported here evaluate the use of clipping gradients, spanning longer time ranges with leaky integration, advanced momentum techniques, using more powerful output probability models, and encouraging sparser gradients to help symmetry breaking and credit assignment. The experiments are performed on text and music data and show off the combined effects of these techniques in generally improving both training and test error.

本文探讨了相关的优化问题，尝试使用梯度削减，跨越更长的时间范围，强化动量技术，使用更强大的输出概率模型，以及鼓励更稀疏的梯度来帮助对称性打破和学分分配等几个方面，以提高长序列的训练的可行性和效率。实验结果在文本和音乐数据的训练和测试误差中表现出了显著的进步。

循环神经网络优化的进展