BriefGPT.xyz
Jul, 2023
FLASK:基于对齐技能集的细粒度语言模型评估
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
HTML
PDF
Seonghyeon Ye, Doyoung Kim, Sungdong Kim, Hyeonbin Hwang, Seungone Kim...
TL;DR
基于技能集的细粒度语言模型评估FLASK,通过分解粗粒度评分到实例级的技能集级别,能够更准确地衡量模型性能并通过分析使语言模型在特定技能方面更加熟练。
Abstract
evaluation
of
large language models
(LLMs) is challenging because aligning to human values requires the composition of multiple skills and the required set of skills varies depending on the instruction. Recent st
→