Most Chinese pre-trained encoders take a character as a basic unit and learn representations according to character's external contexts, ignoring the semantics expressed in the word, which is the smallest meaningful unit in Chinese. Hence, we propose a novel word aligned attention to incorporate word segmentation information, which is complementary to variou