neural machine translation (NMT) generates the next target token given as
input the previous ground truth target tokens during training while the
previous generated target tokens during inference, which causes discrepancy
between training and inference as well as error propagation, and