TL;DR为了有效评估 Large Language Models(LLMs) 使用外部工具回答问题的能力,我们开发了一个名为 ToolQA 的新数据集,并使用可伸缩的自动化过程进行数据集的管理,并使用13种专门设计的工具进行交互以回答问题。
Abstract
large language models (LLMs) have demonstrated impressive performance in various NLP tasks, but they still suffer from challenges such as hallucination and weak numerical reasoning. To overcome these challenges, externa