We introduce SHARCS for adaptive inference that takes into account the
hardness of input samples. SHARCS can train a router on any transformer
network, enabling the model to direct different samples to sub-networks with
varying widths. Our experiments demonstrate that: (1) SHARCS outperforms or
complements existing per-sample adaptive inference methods across various
classification tasks in terms of accuracy vs. FLOPs; (2) SHARCS generalizes
across different architectures and can be even applied to compressed and
efficient transformer encoders to further improve their efficiency; (3) SHARCS
can provide a 2 times inference speed up at an insignificant drop in accuracy.

SHARCS 是一种自适应推理方法，通过考虑输入样本的难度，训练了一个路由器来将不同样本定向到具有不同宽度的子网络，实验证明，SHARCS 在准确性与 FLOPs 方面优于或补充了现有的逐样本自适应推理方法，能够泛化到不同的架构，甚至应用于压缩和高效的 Transformer 编码器以进一步提高其效率，并且能够在几乎不损失准确性的情况下提供 2 倍的推理加速。