Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks that require a compound understanding of knowledge. However, deployment of the LLMs in real-world applications can be challenging due to their high computational requirements and concerns on data privacy. Previous studies have focused on building task-specific small language models (LMs) by fine-tuning them with labeled data or distilling LLMs. However, these approaches are ill-suited for knowledge-intensive reasoning tasks due to the limited capacity of small LMs in memorizing the knowledge required. Motivated by our theoretical analysis on memorization, we propose Knowledge-Augmented Reasoning Distillation (KARD), a novel method that fine-tunes small LMs to generate rationales with augmented knowledge retrieved from an external knowledge base. Moreover, we further propose a neural reranker to obtain documents relevant to rationale generation. We empirically show that KARD significantly improves the performance of small T5 and Flan-T5 models on the challenging knowledge-intensive reasoning datasets, namely MedQA-USMLE and StrategyQA. Notably, our method makes the 250M models achieve superior performance against the fine-tuned 3B models, having 12 times larger parameters, on both MedQA-USMLE and StrategyQA benchmarks.

提出了知识增强推理提炼（KARD）这一新颖的方法，以从外部知识库检索的增强知识fine-tune小型LM，来生成 rationale，并且进一步提出了神经重新排序器以获取与理性产生相关的文档。该方法在知识密集型推理数据集上显著提高了小型T5和Flan-T5模型的性能。

知识增强的推理蒸馏：面向知识密集型任务的小型语言模型