The finetuning of pretrained transformer-based language generation models are
typically conducted in an end-to-end manner, where the model learns to attend
to relevant parts of the input by itself. However, there does not exist a
mechanism to directly control the model's focus. This wo