BriefGPT.xyz
Dec, 2023
T-Eval: 逐步评估工具利用能力
T-Eval: Evaluating the Tool Utilization Capability Step by Step
HTML
PDF
Zehui Chen, Weihua Du, Wenwei Zhang, Kuikun Liu, Jiangning Liu...
TL;DR
大型语言模型的工具利用能力评估需要细致分解,利用指导、规划、推理、检索、理解和审查等多个子过程,通过T-Eval提供了多个子领域的工具利用评估,既展示了结果导向评估的一致性,也提供了对大型语言模型能力的细粒度分析。
Abstract
large language models
(LLM) have achieved remarkable performance on various NLP tasks and are augmented by tools for broader applications. Yet, how to evaluate and analyze the
tool-utilization capability
of LLMs
→