探索和基准测试大型语言模型的规划能力

Jun, 2024

探索和基准测试大型语言模型的规划能力

Exploring and Benchmarking the Planning Capabilities of Large Language Models

Bernd Bohnet, Azade Nova, Aaron T Parisi, Kevin Swersky, Katayoon Goshvadi...

TL;DR提高大型语言模型的规划能力，研究领域包括基于上下文学习、微调，以及在未知领域的性能评估。

Abstract

We seek to elevate the planning capabilities of large language models (LLMs)investigating four main directions. First, we construct a comprehensive benchmark suite encompassing both classical planning domains and