Understanding how linguistic structures are encoded in contextualized
embedding could help explain their impressive performance across NLP@. Existing
approaches for probing them usually call for training classifiers and use the
accuracy, mutual information, or complexity as a proxy for