encoder-decoder models have achieved remarkable success in abstractive text
summarization, which aims to compress one or more documents into a shorter
version without the loss of the essential content. Unfortunately, these models
mostly suffer a discrepancy between training and inferen