TL;DR基于Dynamic Sparse No Training (DSnoT)的训练无关的微调方法,能够有效地提高稀疏语言模型的性能,并开拓了将稀疏性应用于大型语言模型的潜力。
Abstract
The ever-increasing large language models (LLMs), though opening a potential path for the upcoming artificial general intelligence, sadly drops a daunting obstacle on the way towards their on-device deployment. As one of the most well-established pre-LLMs approaches in reducing model c