TL;DR本研究旨在探讨利用容量不同的模型集合进行级联的 Model Cascading 技术,能够提高 NLP 系统的计算效率和预测准确性,并且引入更多模型可进一步提高效率。
Abstract
Do all instances need inference through the big models for a correct
prediction? Perhaps not; some instances are easy and can be answered correctly
by even small capacity models. This provides opportunities for improving the
computational efficiency of systems. In this work, we present