Recently, multiple architectures has been proposed to improve the efficiency
of the transformer language models through changing the design of the
self-attention block to have a linear-cost inference (LCI). A notable approach
in this realm is the State-Space Machines (SSMs) architectur