BriefGPT.xyz
Oct, 2023
伪智能: 语言模型评估的统一框架
Pseudointelligence: A Unifying Framework for Language Model Evaluation
HTML
PDF
Shikhar Murty, Orr Paradise, Pratyusha Sharma
TL;DR
受伪随机性启发,我们提出了伪智能概念,捕捉了“智能在于观者眼中”的最大化原则。具体而言,我们提出了一个复杂性理论框架,将模型评估描述为模型和学习评估者之间的动态交互。我们演示了该框架可用于推理语言模型评估中的两个案例研究,并分析了现有评估方法。
Abstract
With
large language models
surpassing human performance on an increasing number of benchmarks, we must take a principled approach for targeted evaluation of
model capabilities
. Inspired by pseudorandomness, we pr
→