This paper introduces a novel approach for efficiently distilling LLMs into smaller, application-specific models, significantly reducing operational costs and manual labor. Addressing the challenge of deploying computationally intensive LLMs in specific applications or edge devices, this technique utilizes LLMs' reasoning capabilities to generate labels and natural language rationales for unlabeled data. Our approach enhances both finetuning and distillation by employing a multi-task training framework where student models mimic these rationales alongside teacher predictions. Key contributions include the employment of zero-shot prompting to elicit teacher model rationales, reducing the necessity for handcrafted few-shot examples and lowering the overall token count required, which directly translates to cost savings given the pay-per-token billing model of major tech companies' LLM APIs. Additionally, the paper investigates the impact of explanation properties on distillation efficiency, demonstrating that minimal performance loss occurs even when rationale augmentation is not applied across the entire dataset, facilitating further reductions of tokens. This research marks a step toward the efficient training of task-specific models with minimal human intervention, offering substantial cost-savings while maintaining, or even enhancing, performance.

本文介绍了一种新颖的方法，可以将LLMs高效地提炼成更小的、面向特定应用的模型，显著降低运营成本和人工劳动。该方法利用LLMs的推理能力为无标签数据生成标签和自然语言解释，从而在有限数据和计算资源情况下增强模型的微调和提炼效果。其中关键贡献包括使用零样本提示获取教师模型的解释，减少手工制作的小样本示例的需求，并降低令牌数，这直接转化为主要技术公司LLM API的按令牌计费模式下的成本节约。此外，本文还研究了解释特性对提炼效率的影响，并证明了即使在整个数据集上未应用理由增强时也几乎不会导致性能损失，从而进一步减少了令牌数量。这项研究是朝向最小化人工干预、高效训练特定任务模型的一步，提供了大幅降低成本并保持甚至增强性能的可能性。

利用零射击提示进行高效的语言模型蒸馏