In the pursuit of explaining implicit regularization in deep learning, prominent focus was given to matrix and tensor factorizations, which correspond to simplified neural networks. It was shown that these models exhibit implicit regularization towards low matrix and tensor ranks, respectively. Drawing closer to practical deep learning, the current paper theoretically analyzes the implicit regularization in hierarchical tensor factorization, a model equivalent to certain deep convolutional neural networks. Through a dynamical systems lens, we overcome challenges associated with hierarchy, and establish implicit regularization towards low hierarchical tensor rank. This translates to an implicit regularization towards locality for the associated convolutional networks. Inspired by our theory, we design explicit regularization discouraging locality, and demonstrate its ability to improve performance of modern convolutional networks on non-local tasks, in defiance of conventional wisdom by which architectural changes are needed. Our work highlights the potential of enhancing neural networks via theoretical analysis of their implicit regularization.

本文从动态系统的角度对深度学习中隐藏的正则化进行了理论分析，并研究了等效于一些深度卷积神经网络的分层张量因式分解模型中的隐藏正则化。最终证明了该模型会自动进行低阶张量秩的隐藏正则化，实现与卷积网络相应的局部性隐式正则化。我们基于该理论设计了明确的惩罚局部性的正则化方法，并展示了它在非本地任务上提高现代卷积神经网络性能的能力，这违反了传统智慧认为需要改变结构的观点，凸显出通过理论分析神经网络的隐式正则化来增强其性能的潜力。

分层张量分解和深度卷积神经网络中的隐式正则化