Recent studies have shown that maintaining a consistent response style by human experts and enhancing data quality in training sets can significantly improve the performance of fine-tuned Large Language Models (LLMs) while reducing the number of training examples needed. However, the precise definition of style and the relationship between style, data quality, and LLM performance remains unclear. This research decomposes response style into presentation and composition styles and finds that, among training data of similar quality, those with higher style consistency lead to better LLM performance. Inspired by this, we introduce Style Consistency-Aware Response Ranking (SCAR), which automatically prioritizes instruction-response pairs in the training set based on their response stylistic consistency. By selecting the most style-consistent examples, ranging from the top 25% to 0.7% of the full dataset, the fine-tuned LLMs can match or even surpass the performance of models trained on the entire dataset in coding and open-ended question-answering benchmarks. Code and data are available at https://github.com/zhuang-li/SCAR .

通过维持人类专家的一致性响应风格和提高训练集中的数据质量，可以显著提高精调大型语言模型（LLMs）的性能，同时减少所需的训练样本数量。研究将响应风格分解为表达和组合风格，并发现在类似质量的训练数据中，具有更高风格一致性的数据可提高LLM的性能。基于这一观察，引入Style Consistency-Aware Response Ranking（SCAR），根据响应的风格一致性自动对训练集中的指令-响应对进行优先排序。通过选择最风格一致的示例，从完整数据集中的前25％到0.7％，精调的LLMs在编码和开放式问答基准测试中能够达到甚至超过整个数据集上训练的模型的性能。

SCAR：大型语言模型的高效指令调整（Instruction-Tuning）通过风格一致性感知的响应排序