Ensembles of models have been empirically shown to improve predictive
performance and to yield robust measures of uncertainty. However, they are
expensive in computation and memory. Therefore, recent research has focused on
distilling ensembles into a single compact model, reducing the