model distillation is an effective and widely used technique to transfer
knowledge from a teacher to a student network. The typical application is to
transfer from a powerful large network or ensemble to a small network, that is
better suited to low-memory or fast execution requirement