This paper is the second in the series Commutative Scaling of Width and Depth (WD) about commutativity of infinite width and depth limits in deep neural networks. Our aim is to understand the behaviour of neural functions (functions that depend on a neural network model) as width and depth go to infinity (in some sense), and eventually identify settings under which commutativity holds, i.e. the neural function tends to the same limit no matter how width and depth limits are taken. In this paper, we formally introduce and define the commutativity framework, and discuss its implications on neural network design and scaling. We study commutativity for the neural covariance kernel which reflects how network layers separate data. Our findings extend previous results established in [55] by showing that taking the width and depth to infinity in a deep neural network with skip connections, when branches are suitably scaled to avoid exploding behaviour, result in the same covariance structure no matter how that limit is taken. This has a number of theoretical and practical implications that we discuss in the paper. The proof techniques in this paper are novel and rely on tools that are more accessible to readers who are not familiar with stochastic calculus (used in the proofs of WD(I))).

本论文研究深度神经网络的无限宽度和深度极限的可交换性行为，提出并定义了可交换性框架，并讨论了其对神经网络设计和扩展的影响。通过研究神经协方差核的可交换性，证明了在深度神经网络中，对于具有跳跃连接且分支适当缩放以避免爆炸行为的情况，当无限制地提高宽度和深度时，得到的协方差结构将趋于相同。这些发现有一些理论和实践上的意义。本论文采用了创新的证明技巧，并依赖于更易于理解的工具，使其对不熟悉随机微积分（用于WD(I)的证明）的读者更易理解。

深度神经网络中的可交换宽度和深度缩放