Various natural language processing (NLP) tasks necessitate models that are
efficient and small based on their ultimate application at the edge or in other
resource-constrained environments. While prior research has reduced the size of
these models, increasing computational efficiency