Supervised Fine-Tuning (SFT) serves as a crucial phase in aligning Large
Language Models (LLMs) to specific task prerequisites. The selection of
fine-tuning data profoundly influences the model's performance, whose principle
is traditionally grounded in data quality and distribution. In this paper, we
introduce a new dimension in SFT data selection: learnability. This new
dimension is motivated by the intuition that SFT unlocks capabilities acquired
by a LLM during the pretraining phase. Given that different pretrained models
have disparate capabilities, the SFT data appropriate for one may not suit
another. Thus, we introduce the term learnability to define the suitability of
data for effective learning by the model. We present the Loss Based SFT Data
Selection (LoBaSS) method, utilizing data learnability as the principal
criterion for the selection SFT data. This method provides a nuanced approach,
allowing the alignment of data selection with inherent model capabilities,
ensuring optimal compatibility and learning efficiency. In experimental
comparisons involving 7B and 13B models, our LoBaSS method is able to surpass
full-data fine-tuning at merely 6% of the total training data. When employing
16.7% of the data, LoBaSS harmonizes the model's capabilities across
conversational and mathematical domains, proving its efficacy and adaptability.

利用数据的可学习性作为选择模型数据的主要标准，研究通过引入损失为基础的 SFT 数据选择方法（LoBaSS）来确保数据选择与模型能力的匹配，从而提高对话和数学领域的模型能力。LoBaSS 方法在仅使用总训练数据的 6% 的情况下，超过全数据微调方法，在使用 16.7% 的数据时，能够协调模型在对话和数学领域的能力，验证其有效性和适应性。