BriefGPT.xyz
Feb, 2024
更高层次需要更多的LoRA专家
Higher Layers Need More LoRA Experts
HTML
PDF
Chongyang Gao, Kezhen Chen, Jinmeng Rao, Baochen Sun, Ruibo Liu...
TL;DR
研究提出了一种新颖的参数高效的MoE方法,称为MoLA,适用于基于Transformer的模型,通过为每个模型层分配不同数量的LoRA专家,该方法在六个著名的NLP和常识QA基准上展示了与基线相当或更好的性能,该工作可以作为各种应用的即插即用的参数高效调优方法。
Abstract
parameter-efficient tuning
(PEFT) techniques like
low-rank adaptation
(LoRA) offer training efficiency on Large Language Models, but their impact on model performance remains limited. Recent efforts integrate LoR
→