To date, most abstractive summarisation models have relied on variants of the
negative log-likelihood (NLL) as their training objective. In some cases,
reinforcement learning has been added to train the models with an objective
that is closer to their evaluation measures (e.g. ROUGE). However, the reward
function to be used within the reinforcement learning approach can play a key
role for performance and is still partially unexplored. For this reason, in
this paper, we propose two reward functions for the task of abstractive
summarisation: the first function, referred to as RwB-Hinge, dynamically
selects the samples for the gradient update. The second function, nicknamed
RISK, leverages a small pool of strong candidates to inform the reward. In the
experiments, we probe the proposed approach by fine-tuning an NLL pre trained
model over nine summarisation datasets of diverse size and nature. The
experimental results show a consistent improvement over the negative
log-likelihood baselines.

该研究提出了两种用于抽象摘要任务的奖励函数：RwB-Hinge 和 RISK。实验结果表明，这些函数在以 NLL 为基线的基础上实现了一致的性能提升。