BriefGPT.xyz
Nov, 2023
结合推测抽样和KV-Cache优化的基于OpenVINO的生成式人工智能技术的利用
Leveraging Speculative Sampling and KV-Cache Optimizations Together for Generative AI using OpenVINO
HTML
PDF
Haim Barad, Ekaterina Aidova, Yury Gorbachev
TL;DR
通过使用推断优化和动态执行中的推测采样方法,结合模型优化技术,如量化,可以提供一种优化解决方案。执行过程中利用了KV缓存。
Abstract
inference optimizations
are critical for improving user experience and reducing infrastructure costs and power consumption. In this article, we illustrate a form of
dynamic execution
known as
→