BriefGPT.xyz
Jun, 2024
压缩与服务:低开销为数千个LoRA适配器提供服务
Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead
HTML
PDF
Rickard Brüel-Gabrielsson, Jiacheng Zhu, Onkar Bhardwaj, Leshem Choshen, Kristjan Greenewald...
TL;DR
通过压缩使用共享基础和LoRA特定缩放矩阵对LoRAs进行服务,可以提高吞吐量并保持性能的75%。
Abstract
fine-tuning
large
language models
(LLMs) with
low-rank adapters
(LoRAs) has become common practice, often yielding numerous copies of the
→