This paper describes PULSAR, our system submission at the ImageClef 2023 MediQA-Sum task on summarising patient-doctor dialogues into clinical records. The proposed framework relies on domain-specific pre-training, to produce a specialised language model which is trained on task-specific natural data augmented by synthetic data generated by a black-box LLM. We find limited evidence towards the efficacy of domain-specific pre-training and data augmentation, while scaling up the language model yields the best performance gains. Our approach was ranked second and third among 13 submissions on task B of the challenge. Our code is available at https://github.com/yuping-wu/PULSAR.

本文介绍了我们在ImageClef 2023 MediQA-Sum任务中提交的系统PULSAR，用于将患者与医生的对话总结为临床记录。我们的方法主要依赖于领域特定的预训练和数据增强，并利用黑匣子LLM生成的合成数据来训练一个专业化的语言模型。虽然我们发现领域特定的预训练和数据增强的效果有限，但是增加语言模型的规模可以获得最佳的性能提升。在任务B的比赛中，我们的方法在13个提交中排名第二和第三。我们的代码可在此https URL 中找到。

PULSAR在MEDIQA-Sum 2023的表现: 大型语言模型与合成对话技术协作，将患者对话转换为医疗记录