BriefGPT.xyz
Jun, 2024
GameBench:评估LLM代理的战略推理能力
GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents
HTML
PDF
Anthony Costarelli, Mat Allen, Roman Hauksson, Grace Sodunke, Suhas Hariharan...
TL;DR
使用大型语言模型在游戏中评估策略推理能力的跨领域基准(GameBench)显示,虽然大多数测试模型并不及人类水平,但对策略推理能力的两种框架(CoT和RAP)能够提高分数。
Abstract
large language models
have demonstrated remarkable
few-shot performance
on many natural language understanding tasks. Despite several demonstrations of using
→