BriefGPT.xyz
Dec, 2022
教授小型语言模型推理
Teaching Small Language Models to Reason
HTML
PDF
Lucie Charlotte Magister, Jonathan Mallinson, Jakub Adamek, Eric Malmi, Aliaksei Severyn
TL;DR
本文探讨通过知识蒸馏将大型语言模型的推理能力迁移至小于1000亿参数的模型,实现任务的表现提升,对算术、常识和符号推理数据集效果显著,例如在PaLM-540B生成的思考链上进行微调后,T5 XXL在GSM8K的准确率从8.11%提高至21.99%。
Abstract
Chain of thought prompting successfully improves the
reasoning capabilities
of
large language models
, achieving state of the art results on a range of datasets. However, these
→