We propose an empirical approach centered on the spectral dynamics of weights -- the behavior of singular values and vectors during optimization -- to unify and clarify several phenomena in deep learning. We identify a consistent bias in optimization across various experiments, from small-scale ``grokking'' to large-scale tasks like image classification with ConvNets, image generation with UNets, speech recognition with LSTMs, and language modeling with Transformers. We also demonstrate that weight decay enhances this bias beyond its role as a norm regularizer, even in practical systems. Moreover, we show that these spectral dynamics distinguish memorizing networks from generalizing ones, offering a novel perspective on this longstanding conundrum. Additionally, we leverage spectral dynamics to explore the emergence of well-performing sparse subnetworks (lottery tickets) and the structure of the loss surface through linear mode connectivity. Our findings suggest that spectral dynamics provide a coherent framework to better understand the behavior of neural networks across diverse settings.

本研究针对深度学习中的一些现象存在的统一和澄清的需求，提出了一种以权重的谱动态为中心的实证方法。研究发现，谱动态不仅能区分记忆网络与泛化网络，还能解释稀疏子网络（彩票票据）的出现及损失表面的结构，其结果为理解神经网络在多种环境中的行为提供了一个连贯的框架。

通过权重的谱动态来研究深度学习