In this article, we use probing to investigate phenomena that occur during fine-tuning and knowledge distillation of a BERT-based natural language understanding (NLU) model. Our ultimate purpose was to use probing to better understand practical production problems and consequently to build better NLU models. We designed experiments to see how fine-tuning changes the linguistic capabilities of BERT, what the optimal size of the fine-tuning dataset is, and what amount of information is contained in a distilled NLU based on a tiny Transformer. The results of the experiments show that the probing paradigm in its current form is not well suited to answer such questions. Structural, Edge and Conditional probes do not take into account how easy it is to decode probed information. Consequently, we conclude that quantification of information decodability is critical for many practical applications of the probing paradigm.

本论文通过probing调查fine-tuning和knowledge distillation过程中，BERT基础的自然语言理解（NLU）模型出现的现象。实验结果表明，当前形式的probing范式不适合回答这些问题，因此，信息可解码的量化是探测范式在许多实际应用中至关重要的。

能否利用探测来更好地理解BERT NLU的微调和知识蒸馏？