Grounding the reasoning ability of large language models (LLMs) for embodied tasks is challenging due to the complexity of the physical world. Especially, LLM planning for multi-agent collaboration requires communication of agents or credit assignment as the feedback to re-adjust the proposed plans and achieve effective coordination. However, existing methods that overly rely on physical verification or self-reflection suffer from excessive and inefficient querying of LLMs. In this paper, we propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans. Specifically, we perform critic regression to learn a sequential advantage function from LLM-planned data, and then treat the LLM planner as an optimizer to generate actions that maximize the advantage function. It endows the LLM with the foresight to discern whether the action contributes to accomplishing the final task. We provide theoretical analysis by extending advantage-weighted regression in reinforcement learning to multi-agent systems. Experiments on Overcooked-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate, and also significantly decreases the interaction steps of agents and query rounds of LLMs, demonstrating its high efficiency for grounding LLMs. More results are given at \url{https://read-llm.github.io/}.

通过引入增强优势反馈（ReAd）的多智能体协作模型，我们提出了一种新的用于解决复杂物理世界中大型语言模型（LLMs）推理能力的框架，该框架通过对LLM计划数据进行评论回归来学习顺序优势函数，并将LLM规划器视为最优化器生成最大化优势函数的行动，从而为LLM赋予了能够判断行动是否有助于完成最终任务的远见。

迈向高效的LLM对实体多智能体协作的隶属