Pre-trained deep neural network language models such as ELMo, GPT, bert and XLNet have recently achieved state-of-the-art performance on a variety of language understanding tasks. However, their size makes them impractical for a number of scenarios, especially on mobile and edge device