BriefGPT.xyz
Apr, 2024
文本研究:经过指导调整的语言模型比你想象的多次选择筛选器更稳健
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think
HTML
PDF
Xinpeng Wang, Chengzhi Hu, Bolei Ma, Paul Röttger, Barbara Plank
TL;DR
通过研究,我们发现文本答案比第一个标记概率更具鲁棒性,特别是在问题干扰和选项顺序改变的情况下,这进一步证明了对文本答案的评估优于对第一个标记概率的评估。
Abstract
multiple choice questions
(MCQs) are commonly used to evaluate the capabilities of
large language models
(LLMs). One common way to evaluate the model response is to rank the candidate answers based on the log pro
→