We investigate the compression of deep neural networks by quantizing their
weights and activations into multiple binary bases, known as multi-bit networks
(MBNs), which accelerate the inference and reduce the storage for the
deployment on low-resource mobile and embedded platforms. We