BriefGPT.xyz
Jun, 2024
Open-LLM-Leaderboard:LLM模型评估、基准和竞赛中由多项选择题到开放式问题的转换
Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena
HTML
PDF
Aidar Myrzakhan, Sondos Mahmoud Bsharat, Zhiqiang Shen
TL;DR
通过完全开放式问题的方法,本研究解决了多项选择题中选择偏好和随机猜测的问题,并建立了新的语言模型评估基准。
Abstract
multiple-choice questions
(MCQ) are frequently used to assess large
language models
(LLMs). Typically, an LLM is given a question and selects the answer deemed most probable after adjustments for factors like len
→