基于模型的 QUILT-1M 病理学数据集清洗用于文本条件图像合成

Apr, 2024

基于模型的 QUILT-1M 病理学数据集清洗用于文本条件图像合成

Model-based Cleaning of the QUILT-1M Pathology Dataset for Text-Conditional Image Synthesis

Marc Aubreville, Jonathan Ganz, Jonas Ammeling, Christopher C. Kaltenecker, Christof A. Bertram

TL;DR通过使用自动化流程和语义对齐筛选图像与文本对，我们的研究发现从QUILT-1M数据集中滤除常见杂质可显著提高文本到图像任务中的图像保真度。

Abstract

The quilt-1m dataset is the first openly available dataset containing images harvested from various online sources. While it provides a huge data variety, the image quality and composition is highly heterogeneous, impacting its utility for →