Training efficiency is one of the main problems for Neural Machine
Translation (NMT). Deep networks need for very large data as well as many
training iterations to achieve state-of-the-art performance. This results in
very high computation cost, slowing down research and industrialisat