As large language models rapidly evolve to support longer context, there is a notable disparity in their capability to generate output at greater lengths. Recent study suggests that the primary cause for this imbalance may arise from the lack of data with long-output during alignment training. In light of this observation, attempts are made to re-align foundation models with data that fills the gap, which result in models capable of generating lengthy output when instructed. In this paper, we explore the impact of data-quality in tuning a model for long output, and the possibility of doing so from the starting points of human-aligned (instruct or chat) models. With careful data curation, we show that it possible to achieve similar performance improvement in our tuned models, with only a small fraction of training data instances and compute. In addition, we assess the generalizability of such approaches by applying our tuning-recipes to several models. our findings suggest that, while capacities for generating long output vary across different models out-of-the-box, our approach to tune them with high-quality data using lite compute, consistently yields notable improvement across all models we experimented on. We have made public our curated dataset for tuning long-writing capability, the implementations of model tuning and evaluation, as well as the fine-tuned models, all of which can be openly-accessed.

本研究解决了大语言模型在生成长输出时能力不均的问题，特别是由于缺乏长输出训练数据造成的缺口。我们提出了一种基于高质量数据的调优方法，通过精心策划的数据，展示了在仅有少量训练数据和计算资源的情况下，依然可以显著提升模型的长输出能力。研究结果表明，该方法在不同模型上均能有效改善表现，且我们已公开相关数据集和模型实现，促进了该领域的进一步研究。

最小调优以解锁长输出：高质量数据是关键