Machine learning architectures, including transformers and recurrent neural networks (RNNs) have revolutionized forecasting in applications ranging from text processing to extreme weather. Notably, advanced network architectures, tuned for applications such as natural language processing, are transferable to other tasks such as spatiotemporal forecasting tasks. However, there is a scarcity of ablation studies to illustrate the key components that enable this forecasting accuracy. The absence of such studies, although explainable due to the associated computational cost, intensifies the belief that these models ought to be considered as black boxes. In this work, we decompose the key architectural components of the most powerful neural architectures, namely gating and recurrence in RNNs, and attention mechanisms in transformers. Then, we synthesize and build novel hybrid architectures from the standard blocks, performing ablation studies to identify which mechanisms are effective for each task. The importance of considering these components as hyper-parameters that can augment the standard architectures is exhibited on various forecasting datasets, from the spatiotemporal chaotic dynamics of the multiscale Lorenz 96 system, the Kuramoto-Sivashinsky equation, as well as standard real world time-series benchmarks. A key finding is that neural gating and attention improves the performance of all standard RNNs in most tasks, while the addition of a notion of recurrence in transformers is detrimental. Furthermore, our study reveals that a novel, sparsely used, architecture which integrates Recurrent Highway Networks with neural gating and attention mechanisms, emerges as the best performing architecture in high-dimensional spatiotemporal forecasting of dynamical systems.

该研究解决了机器学习架构在动态系统预测中可迁移性缺乏明确验证的问题。通过分解和重组RNN和变换器的关键组件，研究发现神经门控和注意力机制可以普遍提升RNN的性能，而在变换器中引入递归概念则会适得其反。研究结果表明，结合递归高速公路网络的创新架构在高维时空预测中表现最佳。

Deconstructing Recurrence, Attention, and Gating: Investigating the
  transferability of Transformers and Gated Recurrent Neural Networks in
  forecasting of dynamical systems

解构递归、注意力和门控：探讨变换器和门控递归神经网络在动态系统预测中的可迁移性