BriefGPT.xyz
Aug, 2024
LogicGame:大语言模型基于规则推理能力的基准测试
LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models
HTML
PDF
Jiayi Gui, Yiming Liu, Jiale Cheng, Xiaotao Gu, Xiao Liu...
TL;DR
本研究针对大语言模型(LLMs)在基于规则的推理和计划执行能力评估的不足,提出了LogicGame基准测试。该方法通过设定多样化游戏场景,评估模型对规则的理解、执行及规划能力,并发现其在这些方面存在显著不足,具有重要的实践价值。
Abstract
Large Language Models
(LLMs) have demonstrated notable capabilities across various tasks, showcasing complex problem-solving abilities. Understanding and executing complex rules, along with multi-step planning, are fundamental to
→