We present PSiLON Net, an MLP architecture that uses $L_1$ weight normalization for each weight vector and shares the length parameter across the layer. The 1-path-norm provides a bound for the Lipschitz constant of a neural network and reflects on its generalizability, and we show how PSiLON Net's design drastically simplifies the 1-path-norm, while providing an inductive bias towards efficient learning and near-sparse parameters. We propose a pruning method to achieve exact sparsity in the final stages of training, if desired. To exploit the inductive bias of residual networks, we present a simplified residual block, leveraging concatenated ReLU activations. For networks constructed with such blocks, we prove that considering only a subset of possible paths in the 1-path-norm is sufficient to bound the Lipschitz constant. Using the 1-path-norm and this improved bound as regularizers, we conduct experiments in the small data regime using overparameterized PSiLON Nets and PSiLON ResNets, demonstrating reliable optimization and strong performance.

提出了一种名为PSiLON Net的MLP架构，利用$L_1$权重归一化来处理每个权重向量，并在层间共享长度参数。通过简化1-path-norm并提供对高效学习和近稀疏参数的归纳偏差，设计了PSiLON Net。同时，提出了一种修剪方法来在最后阶段实现精确稀疏性。使用改进的1-path-norm作为正则化器，针对小数据环境下过参数化的PSiLON Nets和PSiLON ResNets进行实验，展示了可靠的优化和强大的性能。

隐藏的协同作用：$L_1$权重归一化与1-路径-范数正则化