Post-Training Quantization (PTQ) enhances the efficiency of Large Language
Models (LLMs) by enabling faster operation and compatibility with more
accessible hardware through reduced memory usage, at the cost of small
performance drops. We explore the role of calibration sets in PTQ, specifically
their effect on hidden activations in various notable open-source LLMs.
Calibration sets are crucial for evaluating activation magnitudes and
identifying outliers, which can distort the quantization range and negatively
impact performance. Our analysis reveals a marked contrast in quantization
effectiveness across models. The older OPT model, which much of the
quantization literature is based on, shows significant performance
deterioration and high susceptibility to outliers with varying calibration
sets. In contrast, newer models like Llama-2 7B, Llama-3 8B, Command-R 35B, and
Mistral 7B demonstrate strong robustness, with Mistral 7B showing near-immunity
to outliers and stable activations. These findings suggest a shift in PTQ
strategies might be needed. As advancements in pre-training methods reduce the
relevance of outliers, there is an emerging need to reassess the fundamentals
of current quantization literature. The emphasis should pivot towards
optimizing inference speed, rather than primarily focusing on outlier
preservation, to align with the evolving characteristics of state-of-the-art
LLMs.

通过减少内存使用和提高操作速度，后训练量化（PTQ）能够增强大型语言模型（LLMs）的效率和与更多硬件的兼容性，尽管会导致一定的性能下降。然而，我们的研究发现在不同已知的开源 LLMs 中，校准集对于评估激活幅度和检测异常值至关重要，异常值可能扭曲量化范围并对性能产生负面影响。因此，我们建议重新评估当前量化文献的基础知识，从主要关注异常值保留转向优化推断速度，以适应现代化 LLMs 的特性。

现代 LLM 的量化中异常值和校准集的影响逐渐减小

Outliers and Calibration Sets have Diminishing Effect on Quantization of  Modern LLMs

Double descent presents a counter-intuitive aspect within the machine
learning domain, and researchers have observed its manifestation in various
models and tasks. While some theoretical explanations have been proposed for
this phenomenon in specific contexts, an accepted theory for its occurring
mechanism in deep learning remains yet to be established. In this study, we
revisited the phenomenon of double descent and discussed the conditions of its
occurrence. This paper introduces the concept of class-activation matrices and
a methodology for estimating the effective complexity of functions, on which we
unveil that over-parameterized models exhibit more distinct and simpler class
patterns in hidden activations compared to under-parameterized ones. We further
looked into the interpolation of noisy labelled data among clean
representations and demonstrated overfitting w.r.t. expressive capacity. By
comprehensively analysing hypotheses and presenting corresponding empirical
evidence that either validates or contradicts these hypotheses, we aim to
provide fresh insights into the phenomenon of double descent and benign
over-parameterization and facilitate future explorations. By comprehensively
studying different hypotheses and the corresponding empirical evidence either
supports or challenges these hypotheses, our goal is to offer new insights into
the phenomena of double descent and benign over-parameterization, thereby
enabling further explorations in the field. The source code is available at
this https URL

该研究重新审视了双下降现象，探讨其发生条件，并引入类激活矩阵的概念和一种估计函数有效复杂性的方法，揭示超参数化模型在隐藏激活中展现出更明显和更简单的类别模式。通过全面分析并提供相应的实证证据来验证或反驳这些假设，旨在为双下降现象和良性超参数化提供新的洞察，并促进未来的探索。

基于类别的激活解读深度双下降之谜

Class-wise Activation Unravelling the Engima of Deep Double Descent

Large language models (LLMs) are highly capable of many tasks but they can
sometimes generate unreliable or inaccurate outputs. To tackle this issue, this
paper studies the problem of uncertainty estimation and calibration for LLMs.
We begin by formulating the uncertainty estimation problem for LLMs and then
propose a supervised approach that takes advantage of the labeled datasets and
estimates the uncertainty of the LLMs' responses. Based on the formulation, we
illustrate the difference between the uncertainty estimation for LLMs and that
for standard ML models and explain why the hidden activations of the LLMs
contain uncertainty information. Our designed approach effectively demonstrates
the benefits of utilizing hidden activations for enhanced uncertainty
estimation across various tasks and shows robust transferability in
out-of-distribution settings. Moreover, we distinguish the uncertainty
estimation task from the uncertainty calibration task and show that a better
uncertainty estimation mode leads to a better calibration performance. In
practice, our method is easy to implement and is adaptable to different levels
of model transparency including black box, grey box, and white box, each
demonstrating strong performance based on the accessibility of the LLM's
internal mechanisms.

通过使用标记的数据集，本文研究了针对大型语言模型（LLMs）的不确定性估计和校准问题，提出了一个监督学习方法来估计 LLMs 响应的不确定性，并展示了利用隐藏激活对不同任务进行增强不确定性估计的好处和在超出分布范围的情况下的鲁棒性，同时区分了不确定性估计任务和不确定性校准任务，并表明更好的不确定性估计模式会导致更好的校准性能。