Recent studies have demonstrated the overwhelming advantage of cross-lingual
pre-trained models (PTMs), such as multilingual bert and XLM, on cross-lingual
NLP tasks. However, existing approaches essentially capture the co-occurrence
among tokens through involving the masked language m