BriefGPT.xyz
Mar, 2024
与人类判断相一致:大型语言模型评估者中的成对优先关系的作用
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
HTML
PDF
Yinhong Liu, Han Zhou, Zhijiang Guo, Ehsan Shareghi, Ivan Vulic...
TL;DR
使用Pairwise-preference Search(PAIRS)方法,通过对比评估候选文本,解决了大型语言模型(LLMs)在评估中出现的偏差与不连贯问题。
Abstract
large language models
(LLMs) have demonstrated promising capabilities as
automatic evaluators
in assessing the quality of generated natural language. However, LLMs still exhibit biases in evaluation and often str
→