Recent work has shown that fast, compact low-bitwidth neural networks can be
surprisingly accurate. These networks use homogeneous binarization: all
parameters in each layer or (more commonly) the whole model have the same low
bitwidth (e.g., 2 bits). However, modern hardware allows ef