Deep neural networks' generalization capacity has been studied in a variety of ways, including at least two distinct categories of approach: one based on the shape of the loss landscape in parameter space, and the other based on the structure of the representation manifold in feature space (that is, in the space of unit activities). These two approaches are related, but they are rarely studied together and explicitly connected. Here, we present a simple analysis that makes such a connection. We show that, in the last phase of learning of deep neural networks, compression of the volume of the manifold of neural representations correlates with the flatness of the loss around the minima explored by ongoing parameter optimization. We show that this is predicted by a relatively simple mathematical relationship: loss flatness implies compression of neural representations. Our results build closely on prior work of \citet{ma_linear_2021}, which shows how flatness (i.e., small eigenvalues of the loss Hessian) develops in late phases of learning and lead to robustness to perturbations in network inputs. Moreover, we show there is no similarly direct connection between local dimensionality and sharpness, suggesting that this property may be controlled by different mechanisms than volume and hence may play a complementary role in neural representations. Overall, we advance a dual perspective on generalization in neural networks in both parameter and feature space.

深度神经网络的泛化能力在参数空间的损失景观形状和特征空间（即单位活动的空间）的表示流形结构两个不同的方法中已经被研究，但很少一起研究并显式连接。我们提出了一个简单的分析，建立了它们之间的联系，并展示了表明在深度神经网络的学习的最后阶段，神经表示流形的体积压缩与参数优化过程中所探索的最小值周围的损失的平坦性相关的结果。

神经网络中从损失平坦性到压缩表示的简单连接