The attention model has become a standard component in neural machine
translation (NMT) and it guides translation process by selectively focusing on
parts of the source sentence when predicting each target word. However, we find
that the generation of a target word does not only depend