Several recent advances in AI systems (e.g., Tree-of-Thoughts and Program-Aided Language Models) solve problems by providing a "scaffolding" program that structures multiple calls to language models to generate better outputs. A scaffolding program is written in a programming language such as Python. In this work, we use a language-model-infused scaffolding program to improve itself. We start with a seed "improver" that improves an input program according to a given utility function by querying a language model several times and returning the best solution. We then run this seed improver to improve itself. Across a small set of downstream tasks, the resulting improved improver generates programs with significantly better performance than its seed improver. Afterward, we analyze the variety of self-improvement strategies proposed by the language model, including beam search, genetic algorithms, and simulated annealing. Since the language models themselves are not altered, this is not full recursive self-improvement. Nonetheless, it demonstrates that a modern language model, GPT-4 in our proof-of-concept experiments, is capable of writing code that can call itself to improve itself. We critically consider concerns around the development of self-improving technologies and evaluate the frequency with which the generated code bypasses a sandbox.

使用语言模型加持的搭脚手架程序作为种子，通过多次调用语言模型查询并返回最佳解决方案的方式，改进输入程序并实现自我提升。在此基础上，通过分析搭脚手架程序的自我提升策略，包括束搜索、遗传算法和模拟退火，证明现代语言模型（以我们的概念验证实验中的GPT-4为例）可以编写能够调用自身以实现自我提升的代码。同时，对于搭脚手架程序的开发引发的自我改进技术可能带来的问题，以及生成的代码绕过沙盒的频率进行了评估。

自我教导优化器 (STOP): 递归自我改进代码生成