AgentSims：用于大型语言模型评估的开放源码沙盒

Aug, 2023

AgentSims：用于大型语言模型评估的开放源码沙盒

AgentSims: An Open-Source Sandbox for Large Language Model Evaluation

Jiaju Lin, Haoran Zhao, Aochi Zhang, Yiting Wu, Huqiuyue Ping...

TL;DR使用AgentSims构建任务评估方法，解决现有评估方法的局限性，并提供易于使用的基础设施，供研究人员测试大语言模型的能力。

Abstract

With ChatGPT-like large language models (LLM) prevailing in the community, how to evaluate the ability of LLMs is an open question. Existing evaluation methods suffer from following shortcomings: (1) constrained