In this technical report, we present Zamba, a novel 7B SSM-transformer hybrid
model which achieves competitive performance against leading open-weight models
at a comparable scale. Zamba is trained on 1T tokens from openly available
datasets and is the best non-transformer model at this scale. Zamba pioneers a
unique architecture combining a Mamba backbone with a single shared attention
module, thus obtaining the benefits of attention at minimal parameter cost. Due
to its architecture, Zamba is significantly faster at inference than comparable
transformer models and requires substantially less memory for generation of
long sequences. Zamba is pretrained in two phases: the first phase is based on
existing web datasets, while the second one consists of annealing the model
over high-quality instruct and synthetic datasets, and is characterized by a
rapid learning rate decay. We open-source the weights and all checkpoints for
Zamba, through both phase 1 and annealing phases.

Zamba 是一个独特的 7B SSM-transformer 混合模型，通过使用 Mamba 骨干和单个共享的注意力模块，以最小的参数成本实现与领先的开放式模型相媲美的性能，同时具有更快的推理速度和更低的内存需求，且在预训练中使用两个阶段：分别基于现有的网络数据集和高质量的指导和合成数据集。