Individual neurons in neural networks often represent a mixture of unrelated features. This phenomenon, called polysemanticity, can make interpreting neural networks more difficult and so we aim to understand its causes. We propose doing so through the lens of feature \emph{capacity}, which is the fractional dimension each feature consumes in the embedding space. We show that in a toy model the optimal capacity allocation tends to monosemantically represent the most important features, polysemantically represent less important features (in proportion to their impact on the loss), and entirely ignore the least important features. Polysemanticity is more prevalent when the inputs have higher kurtosis or sparsity and more prevalent in some architectures than others. Given an optimal allocation of capacity, we go on to study the geometry of the embedding space. We find a block-semi-orthogonal structure, with differing block sizes in different models, highlighting the impact of model architecture on the interpretability of its neurons.

通过特征容量的视角，我们探讨单个神经元是否会包含无关的特征，这种现象被称为多语性，我们在玩具模型中证明最优特征容量分配倾向于在嵌入空间中（按照它们对损失的影响比例）多意地表示次重要特征，单意地表示最重要特征，而完全忽略最不重要的特征。当输入具有更高的峭度或稀疏性时，多语性更为普遍，而且在某些体系结构中更为普遍。此外，我们发现嵌入空间具有块半正交结构，不同的模型具有不同的块大小，突出了模型体系结构对其神经元的可解释性的影响。

神经网络中的多释义性和容量