AbstractPre-trained language models have been applied to various
nlp tasks with considerable performance gains. However, the large model sizes, together with the long inference time, limit the deployment of such models in real-time applications. Typical approaches consider
→