We propose a fresh take on understanding the mechanisms of neural networks by analyzing the rich structure of parameters contained within their optimization trajectories. Towards this end, we introduce some natural notions of the complexity of optimization trajectories, both qualitative and quantitative, which reveal the inherent nuance and interplay involved between various optimization choices, such as momentum, weight decay, and batch size. We use them to provide key hallmarks about the nature of optimization in deep neural networks: when it goes right, and when it finds itself in a dead end. Further, thanks to our trajectory perspective, we uncover an intertwined behaviour of momentum and weight decay that promotes directional exploration, as well as a directional regularization behaviour of some others. We perform experiments over large-scale vision and language settings, including large language models (LLMs) with up to 12 billion parameters, to demonstrate the value of our approach.

我们通过分析神经网络中优化轨迹中包含的参数丰富结构，提出了一种全新的理解机制的方法。我们引入一些关于优化轨迹复杂性的自然概念，同时定性和定量地揭示了动量、权重衰减和批处理大小等各种优化选择之间的内在细微差别和相互作用。对于深度神经网络中优化的本质，当它进行得顺利时以及当它陷入瓶颈时，我们利用这些概念提供了关键特点。此外，得益于我们对优化轨迹的观点，我们发现动量和权重衰减之间存在相互交织的行为，促进了方向探索，并且其他一些行为则具有方向性的规范化行为。我们通过在大规模视觉和语言环境下进行实验证明了我们方法的价值，包括具有多达120亿参数的大型语言模型。

神经网络和LLMs中的优化轨迹的标志：长度、曲线和死胡同