In this paper, we take the initiative to investigate the performance of LLMs on complex planning tasks that require LLMs to understand a virtual spatial environment simulated via natural language and act correspondingly in text. We propose a benchmark named Natural Language Planning (NLP) composed of a set of novel tasks: Brick World, NLVR-based Manipulations, and Natural Language Navigation. We found that current popular LLMs such as ChatGPT still lack abilities in complex planning. This arises a question -- do the LLMs have a good understanding of the environments described in natural language, or maybe other alternatives such as symbolic representations are neater and hence better to be understood by LLMs? To this end, we propose a novel method called CoS (Chain-of-Symbol Prompting) that represents the complex environments with condensed symbolic spatial representations during the chained intermediate thinking steps. CoS is easy to use and does not need additional training on LLMs. Extensive experiments indicate that CoS clearly surpasses the performance of the Chain-of-Thought (CoT) Prompting in all three planning tasks with even fewer tokens used in the inputs compared with CoT on ChatGPT and InstructGPT. The performance gain is strong, by up to 60.8% accuracy (from 31.8% to 92.6%) on Brick World for ChatGPT. CoS also reduces the number of tokens in the prompt obviously, by up to 65.8% of the tokens (from 407 to 139) for the intermediate steps from demonstrations on Brick World.

本文提出了一种名为自然语言计划（NLP）的基准测试，由包含新颖任务的Brick World、基于NLVR的操作和自然语言导航组成，着重研究LLMs在需要理解自然语言描述的虚拟空间环境并进行相应文本操作的复杂计划任务中的表现，发现常规的ChatGPT等LLMs缺乏复杂计划的能力，因此提出了一种适用于LLMs的新方法CoS，可以更好地表示符号空间表示方法，并在三个计划任务中显著提高了ChatGPT的性能。

大型语言模型链式符号提示引发规划行为