The utility of a learned neural representation depends on how well its geometry supports performance in downstream tasks. This geometry depends on the structure of the inputs, the structure of the target outputs, and the architecture of the network. By studying the learning dynamics of networks with one hidden layer, we discovered that the network's activation function has an unexpectedly strong impact on the representational geometry: Tanh networks tend to learn representations that reflect the structure of the target outputs, while ReLU networks retain more information about the structure of the raw inputs. This difference is consistently observed across a broad class of parameterized tasks in which we modulated the degree of alignment between the geometry of the task inputs and that of the task labels. We analyzed the learning dynamics in weight space and show how the differences between the networks with Tanh and ReLU nonlinearities arise from the asymmetric asymptotic behavior of ReLU, which leads feature neurons to specialize for different regions of input space. By contrast, feature neurons in Tanh networks tend to inherit the task label structure. Consequently, when the target outputs are low dimensional, Tanh networks generate neural representations that are more disentangled than those obtained with a ReLU nonlinearity. Our findings shed light on the interplay between input-output geometry, nonlinearity, and learned representations in neural networks.

通过研究具有一个隐藏层的网络的学习动态，我们发现网络的激活函数对于表示几何的影响强于预期：Tanh网络倾向于学习反映目标输出结构的表示，而ReLU网络保留了更多关于原始输入结构的信息。这种差异在广泛的参数化任务中一直存在，我们在这些任务中调节任务输入的几何与任务标签的对齐程度。我们通过分析权重空间中的学习动态，说明了Tanh和ReLU非线性网络之间的差异是由于ReLU的非对称渐近行为导致的，这导致特征神经元专门针对输入空间的不同区域。相比之下，Tanh网络中的特征神经元倾向于继承任务标签结构。因此，当目标输出是低维时，Tanh网络生成的神经表示比采用ReLU非线性的表示更具解耦性。我们的发现揭示了神经网络中输入输出几何、非线性和学习表示之间的相互作用。

任务结构和非线性性共同决定了学习到的表征几何