State-of-the-art abstractive summarization systems frequently hallucinate
content that is not supported by the source document, mainly due to noise in
the training dataset. Existing methods opt to drop the noisy samples or tokens
from the training set entirely, reducing the effective t