In the past 20 years, artificial neural networks have become dominant in
various areas, continually growing in scale. However, the current analysis of
large models has mainly focused on functionality, overlooking the influence of
scale differences on their properties. To address this, we propose the concept
of Emergence Learning, which emphasizes the significance of scale. By studying
models of different scales, we have identified a key factor in achieving higher
performance in large models: the decrease of monosemantic neurons. Building on
this insight, we propose a proactive approach to inhibit monosemanticity for
improved performance. Our solution involves a two-phase process that includes
monosemantic neuron detection and inhibition, supported by theoretical
analysis. Experimental results on various tasks and neural networks demonstrate
the effectiveness of our proposed method.
Following the idea of Emergence Learning, though drawing inspiration from
scaling phenomena, the applicability of our method is not restricted to large
scale alone. Therefore, the experiment is self-contained. However, extending
this research to very large-scale datasets is appealing yet impossible for
research departments due to limited resources. We are delighted to share the
first co-authorship and eagerly await collaboration from any AI company before
submission.

通过研究不同规模的模型，我们发现在大型模型中达到更高性能的关键因素是单语义神经元的减少，提出了一种主动抑制单语义性的两阶段方法，并通过理论分析和实验证明了其有效性。该方法的适用性不限于大规模，但对于研究部门来说，将该研究扩展至非常大规模的数据集是吸引人的，但受到资源限制而不可能实现，期待 AI 公司的合作。

崛起学习：由新兴能力和单义性基础的研究

Emergence Learning: A Rising Direction from Emergent Abilities and a  Monosemanticity-Based Study

In some neural networks, individual neurons correspond to natural
``features'' in the input. Such \emph{monosemantic} neurons are of great help
in interpretability studies, as they can be cleanly understood. In this work we
report preliminary attempts to engineer monosemanticity in toy models. We find
that models can be made more monosemantic without increasing the loss by just
changing which local minimum the training process finds. More monosemantic loss
minima have moderate negative biases, and we are able to use this fact to
engineer highly monosemantic models. We are able to mechanistically interpret
these models, including the residual polysemantic neurons, and uncover a simple
yet surprising algorithm. Finally, we find that providing models with more
neurons per layer makes the models more monosemantic, albeit at increased
computational cost. These findings point to a number of new questions and
avenues for engineering monosemanticity, which we intend to study these in
future work.

本文尝试利用训练过程中的局部最小值改变神经元内在特征，以提高神经网络的可解释性并减少偏差，并发现每层神经元数量的增加可以提高单语性，但会增加计算成本。