Language models have shown effectiveness in a variety of software applications, particularly in tasks related to automatic workflow. These models possess the crucial ability to call functions, which is essential in creating AI agents. Despite the high performance of large-scale language models in cloud environments, they are often associated with concerns over privacy and cost. Current on-device models for function calling face issues with latency and accuracy. Our research presents a new method that empowers an on-device model with 2 billion parameters to surpass the performance of GPT-4 in both accuracy and latency, and decrease the context length by 95\%. When compared to Llama-7B with a RAG-based function calling mechanism, our method enhances latency by 35-fold. This method reduces the latency to levels deemed suitable for deployment across a variety of edge devices in production environments, aligning with the performance requisites for real-world applications.

我们的研究提出了一种新方法，通过使用20亿个参数的设备上模型，在准确率和延迟方面超越了GPT-4，并将上下文长度减少了95％。与基于RAG的函数调用机制Llama-7B相比，我们的方法将延迟提高了35倍，降低到适用于实际生产环境中各种边缘设备部署的水平，符合真实应用的性能要求。

八爪鱼v2：面向超级特工的设备上语言模型