Prompt optimization aims to find the best prompt to a large language model
(LLM) for a given task. LLMs have been successfully used to help find and
improve prompt candidates for single-step tasks. However, realistic tasks for
agents are multi-step and introduce new challenges: (1) Prompt content is
likely to be more extensive and complex, making it more difficult for LLMs to
analyze errors, (2) the impact of an individual step is difficult to evaluate,
and (3) different people may have varied preferences about task execution.
While humans struggle to optimize prompts, they are good at providing feedback
about LLM outputs; we therefore introduce a new LLM-driven discrete prompt
optimization framework that incorporates human-designed feedback rules about
potential errors to automatically offer direct suggestions for improvement. Our
framework is stylized as a genetic algorithm in which an LLM generates new
candidate prompts from a parent prompt and its associated feedback; we use a
learned heuristic function that predicts prompt performance to efficiently
sample from these candidates. This approach significantly outperforms both
human-engineered prompts and several other prompt optimization methods across
eight representative multi-step tasks (an average 27.7% and 28.2% improvement
to current best methods on GPT-3.5 and GPT-4, respectively). We further show
that the score function for tasks can be modified to better align with
individual preferences. We believe our work can serve as a benchmark for
automatic prompt optimization for LLM-driven multi-step tasks. Datasets and
Codes are available at this https URL Project Page is
available at this https URL

通过结合人类设计的反馈规则，采用基于遗传算法的大语言模型驱动的离散提示优化框架，实现了对多步任务中自动提示的改进，相比现有的方法，平均有 27.7% 和 28.2% 的改善效果。

多步任务中的智能提问优化：融合人类反馈和偏好对齐

PRompt Optimization in Multi-Step Tasks (PROMST): Integrating Human  Feedback and Preference Alignment

Machine learning models may involve decision boundaries that change over time
due to updates to rules and regulations, such as in loan approvals or claims
management. However, in such scenarios, it may take time for sufficient
training data to accumulate in order to retrain the model to reflect the new
decision boundaries. While work has been done to reinforce existing decision
boundaries, very little has been done to cover these scenarios where decision
boundaries of the ML models should change in order to reflect new rules. In
this paper, we focus on user-provided feedback rules as a way to expedite the
ML models update process, and we formally introduce the problem of
pre-processing training data to edit an ML model in response to feedback rules
such that once the model is retrained on the pre-processed data, its decision
boundaries align more closely with the rules. To solve this problem, we propose
a novel data augmentation method, the Feedback Rule-Based Oversampling
Technique. Extensive experiments using different ML models and real world
datasets demonstrate the effectiveness of the method, in particular the benefit
of augmentation and the ability to handle many feedback rules.

本研究提出了一种基于反馈规则的过采样技术来处理机器学习模型的更新问题，能够在不断更新的规则下，更快速地重新训练模型，实现决策边界的快速调整和更新。