BriefGPT.xyz
Feb, 2024
APIServe:大型语言模型推理的高效API支持
APIServe: Efficient API Support for Large-Language Model Inferencing
HTML
PDF
Reyna Abhyankar, Zijian He, Vikranth Srivatsa, Hao Zhang, Yiying Zhang
TL;DR
APIServe是第一个针对API增强型LLM的推理框架,可以减少由API调用引起的GPU资源浪费,提高整体服务吞吐量1.6倍,并比现有的LLM推理系统每秒完成2倍更多的请求。
Abstract
large language models
are increasingly integrated with external tools and APIs like ChatGPT plugins to extend their capability beyond language-centric tasks. However, today's
llm inference systems
are designed fo
→