Local learning, which trains a network through layer-wise local targets and losses, has been studied as an alternative to backpropagation (BP) in neural computation. However, its algorithms often become more complex or require additional hyperparameters because of the locality, making it challenging to identify desirable settings in which the algorithm progresses in a stable manner. To provide theoretical and quantitative insights, we introduce the maximal update parameterization ($\mu$P) in the infinite-width limit for two representative designs of local targets: predictive coding (PC) and target propagation (TP). We verified that $\mu$P enables hyperparameter transfer across models of different widths. Furthermore, our analysis revealed unique and intriguing properties of $\mu$P that are not present in conventional BP. By analyzing deep linear networks, we found that PC's gradients interpolate between first-order and Gauss-Newton-like gradients, depending on the parameterization. We demonstrate that, in specific standard settings, PC in the infinite-width limit behaves more similarly to the first-order gradient. For TP, even with the standard scaling of the last layer, which differs from classical $\mu$P, its local loss optimization favors the feature learning regime over the kernel regime.

本研究针对局部学习算法在神经计算中的复杂性和超参数设置的挑战，提出了一种新的最大更新参数化（$\mu$P），用于预测编码和目标传播等局部目标的设计。通过对深层线性网络的分析，我们发现$\mu$P在无限宽度极限中展现出独特的性质，使得超参数能够在不同宽度的模型间转移，并且在特定设置下，其表现更接近于一阶梯度，从而对局部损失优化的理解和应用具有重要影响。

无限宽度下的局部损失优化：预测编码网络和目标传播的稳定参数化