分岔路径的花园: 大型语言模型中动态参数分布的观测

Mar, 2024

分岔路径的花园: 大型语言模型中动态参数分布的观测

The Garden of Forking Paths: Observing Dynamic Parameters Distribution in Large Language Models

Carlo Nicolini, Jacopo Staiano, Bruno Lepri, Raffaele Marino

TL;DR这篇论文提出通过观察模型参数的统计分布随时间的演化，特别是通过观察分叉效应，可以帮助理解模型质量的原因，从而降低训练成本和评估工作，并在实践中展示了权重稀疏化的有效性。

Abstract

A substantial gap persists in understanding the reasons behind the exceptional performance of the transformer architecture in nlp. A particularly unexplored area involves the mechanistic description of how the di