BriefGPT.xyz
Aug, 2023
GameEval:对话游戏下LLM的评估
GameEval: Evaluating LLMs on Conversational Games
HTML
PDF
Dan Qiao, Chenfei Wu, Yaobo Liang, Juntao Li, Nan Duan
TL;DR
通过目标驱动的对话游戏,GameEval 提出了一种新的评估大型语言模型的方法,能够全面评估模型的性能,展示其解决复杂问题的综合能力。
Abstract
The rapid advancements in
large language models
(LLMs) have presented challenges in evaluating those models. Existing
evaluation methods
are either reference-based or preference based, which inevitably need human
→