RoCar：基于关系网络的大规模语言模型评估方法

Jul, 2023

RoCar：基于关系网络的大规模语言模型评估方法

RoCar: A Relationship Network-based Evaluation Method to Large Language Models

Ming Wang, Wenfang Wu, Chongyun Gao, Daling Wang, Shi Feng...

TL;DR我们提出了RoCar方法，利用定义的基本模式随机构建任务图，并基于任务图生成自然语言评估任务，以分别评估LLMs的推理和记忆能力。通过任务构建过程的高度随机性，可以确保被测试的LLMs没有直接学习评估任务，保证评估方法的公正性。

Abstract

large language models (LLMs) have received increasing attention. However, due to the complexity of its capabilities, how to rationally evaluate the capabilities of LLMs is still a task to be solved. We propose the rocar