With the growth of model sizes and scale of their deployment, their sheer
size burdens the infrastructure requiring more network and more storage to
accommodate these. While there is a vast literature about reducing model sizes,
we investigate a more traditional type of compression -- one that compresses
the model to a smaller form and is coupled with a deco