BriefGPT.xyz
Aug, 2023
AgentSims:用于大型语言模型评估的开放源码沙盒
AgentSims: An Open-Source Sandbox for Large Language Model Evaluation
HTML
PDF
Jiaju Lin, Haoran Zhao, Aochi Zhang, Yiting Wu, Huqiuyue Ping...
TL;DR
使用AgentSims构建任务评估方法,解决现有评估方法的局限性,并提供易于使用的基础设施,供研究人员测试大语言模型的能力。
Abstract
With ChatGPT-like
large language models
(LLM) prevailing in the community, how to evaluate the ability of LLMs is an open question. Existing
evaluation methods
suffer from following shortcomings: (1) constrained
→