BriefGPT.xyz
Jul, 2023
RoCar:基于关系网络的大规模语言模型评估方法
RoCar: A Relationship Network-based Evaluation Method to Large Language Models
HTML
PDF
Ming Wang, Wenfang Wu, Chongyun Gao, Daling Wang, Shi Feng...
TL;DR
我们提出了RoCar方法,利用定义的基本模式随机构建任务图,并基于任务图生成自然语言评估任务,以分别评估LLMs的推理和记忆能力。通过任务构建过程的高度随机性,可以确保被测试的LLMs没有直接学习评估任务,保证评估方法的公正性。
Abstract
large language models
(LLMs) have received increasing attention. However, due to the complexity of its capabilities, how to rationally evaluate the capabilities of LLMs is still a task to be solved. We propose the
rocar
→