Recently, there has been an increasing interest in automated prompt optimization based on reinforcement learning (RL). This approach offers important advantages, such as generating interpretable prompts and being compatible with black-box foundation models. However, the substantial prompt space size poses challenges for RL-based methods, often leading to suboptimal policy convergence. This paper introduces MultiPrompter, a new framework that views prompt optimization as a cooperative game between prompters which take turns composing a prompt together. Our cooperative prompt optimization effectively reduces the problem size and helps prompters learn optimal prompts. We test our method on the text-to-image task and show its ability to generate higher-quality images than baselines.

最近，基于强化学习的自动化提示优化引起了越来越多的关注。这种方法具有重要优势，比如生成可解释的提示并与黑匣子基础模型兼容。然而，庞大的提示空间大小对于基于强化学习的方法构成挑战，常常导致次优策略收敛。本文提出了MultiPrompter，一个新的框架，将提示优化视为一种在协作博弈中，由提示者轮流共同组成提示的过程。我们的协作提示优化有效地减小了问题的规模，并帮助提示者学习到最优提示。我们在文本到图像任务上测试了我们的方法，并展示了其生成比基准模型更高质量图像的能力。

多智能体强化学习中的合作提示优化