Large language models (LLMs) have attracted great attention given their
strong performance on a wide range of NLP tasks. In practice, users often
expect generated texts to fall within a specific length range, making length
controlled generation an important topic, especially for GPT-style models.
Existing length control methods mostly focus on a simple control type of "equal
to" a target length. Different from them, we propose a prompt-based method to
achieve length controlled generation under different control types with high
accuracy. In particular, we adopt reinforcement learning (RL) and sample
filtering with the reward signal given by rule-based reward models, which
enhances the length control ability of models by rewarding outputs that follow
certain control instructions. In addition, we introduce a standard prompt
extractor to parse arbitrary users' input into standard control instructions.
Experiments show that our method significantly improves the accuracy of
prompt-based length control on popular summarization datasets like CNNDM and
NYT under multiple control types. Moreover, both the standard prompt extractor
and RL-tuned model show strong generalization to unseen control prompt
templates.

通过采用强化学习和通过基于规则的奖励模型给出的奖励信号进行样本过滤的方式，我们提出了一种基于提示的方法，以不同的控制类型实现控制长度的生成，并在流行的摘要数据集上显著提高了准确率。

基于提示的长度受控生成与多种控制类型

Prompt-Based Length Controlled Generation with Multiple Control Types

Recently, large language models (LLMs) like ChatGPT and GPT-4 have attracted
great attention given their surprising improvement and performance. Length
controlled generation of LLMs emerges as an important topic, which also enables
users to fully leverage the capability of LLMs in more real-world scenarios
like generating a proper answer or essay of a desired length. In addition, the
autoregressive generation in LLMs is extremely time-consuming, while the
ability of controlling this generated length can arbitrarily reduce the
inference cost by limiting the length, and thus satisfy different needs.
Therefore, we aim to propose a prompt-based length control method to achieve
this length controlled generation, which can also be widely applied in
GPT-style LLMs. In particular, we adopt reinforcement learning with the reward
signal given by either trainable or rule-based reward model, which further
affects the generation of LLMs via rewarding a pre-defined target length.
Experiments show that our method significantly improves the accuracy of
prompt-based length control for summarization task on popular datasets like
CNNDM and NYT. We believe this length-controllable ability can provide more
potentials towards the era of LLMs.

我们提出了一种基于提示的长度控制方法，通过采用可训练或基于规则的奖励模型来影响大型语言模型的生成，从而实现长度可控的生成，该方法在广泛适用于类似 GPT 的大型语言模型的同时，显著提高了摘要任务中基于提示的长度控制的准确性。