In neural machine translation (NMT), the most common practice is to stack a
number of recurrent or feed-forward layers in the encoder and the decoder. As a
result, the addition of each new layer improves the translation quality
significantly. However, this also leads to a significant i