BriefGPT.xyz
Oct, 2023
哪种提示更具差异性?用于高效人工LLM评估的数据排序
Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation
HTML
PDF
Meriem Boubdir, Edward Kim, Beyza Ermis, Marzieh Fadaee, Sara Hooker
TL;DR
通过度量方法,我们的研究旨在最小化人工评估所需的注释数量,从而提高评估质量并减少时间和成本。我们发现,这种方法有效地降低了模棱两可的结果,对于未来大型语言模型评估具有重要意义。
Abstract
human evaluation
is increasingly critical for assessing large
language models
, capturing linguistic nuances, and reflecting user preferences more accurately than traditional
→