Recently, contextualized word embeddings outperform static word embeddings on many NLP tasks. However, we still don't know much about the mechanism inside these internal representations produced by BERT. Do they have any common patterns? What are the relations between word sense and context? We find that nearly all the contextualized word vectors of BERT and