BriefGPT.xyz
Jun, 2024
T-MAC: 通过表查找实现的低位LLM在边缘部署上的CPU复兴
T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge
HTML
PDF
Jianyu Wei, Shijie Cao, Ting Cao, Lingxiao Ma, Lei Wang...
TL;DR
T-MAC是一种基于查找表(LUT)的创新方法,用于在CPU上进行高效的低位LLM(即量化权重LLM)推断,并且在同时消除乘法和减少加法的要求方面支持mpGEMM。
Abstract
The deployment of
large language models
(LLMs) on edge devices is increasingly important to enhance on-device intelligence.
weight quantization
is crucial for reducing the memory footprint of LLMs on devices. How
→