We investigate pre-training techniques for abstractive multi-document summarization (MDS), which is much less studied than summarizing single documents. Though recent work has demonstrated the effectiveness of highlighting information salience for pre-training strategy design, it struggles to generate abstractive and reflective summaries, which are critical properties for MDS. To this end, we present PELMS, a pre-trained model that uses objectives based on semantic coherence heuristics and faithfulness constraints with un-labeled multi-document inputs, to promote the generation of concise, fluent, and faithful summaries. To support the training of PELMS, we compile MultiPT, a multi-document pre-training corpus containing over 93 million documents to form more than 3 million unlabeled topic-centric document clusters, covering diverse genres such as product reviews, news, and general knowledge. We perform extensive evaluation of PELMS in low-shot settings on a wide range of MDS datasets. Our approach consistently outperforms competitive comparisons with respect to overall informativeness, abstractiveness, coherence, and faithfulness.

我们研究了用于抽象多文档摘要的预训练技术，提出了一种名为PELMS的预训练模型，使用基于语义连贯性和忠实度约束的目标函数，在无标签的多文档输入上促进简洁、流畅和忠实的摘要生成。通过对超过9300万个文档组成的300多万个无标签主题中心文档群集进行训练，编制了一个名为MultiPT的多文档预训练语料库，涵盖产品评论、新闻和常识等多种流派。在多个低样本设置下，我们对PELMS进行了广泛的评估，发现我们的方法在整体信息性、抽象性、连贯性和忠实度方面始终优于竞争对手的比较。

PELMS：面向高效低样本多文档摘要的预训练