The advent of pre-trained large language models (LLMs) has revolutionized
various natural language processing tasks. These models predominantly employ an
auto-regressive decoding mechanism that utilizes Key-Value (KV) caches to
eliminate redundant calculations for previous tokens. Neve