BriefGPT.xyz
Aug, 2024
对齐调优是否真的破坏了大型语言模型的内部自信度?
Does Alignment Tuning Really Break LLMs' Internal Confidence?
HTML
PDF
Hongseok Oh, Wonseok Hwang
TL;DR
本研究探讨了大型语言模型在四个维度上校准退化的问题,包括模型、校准指标、任务和自信度提取方法。结果表明,虽然对齐与校准的关系并不总是权衡,但在严格的分析条件下,对齐过程始终会损害校准。因此,研究强调了测量模型自信度和校准错误时需谨慎,并呼吁未来研究开发能同时提升指令跟随和校准的算法。
Abstract
Large Language Models
(LLMs) have shown remarkable progress, but their real-world application necessitates reliable
Calibration
. This study conducts a comprehensive analysis of
→