We introduce BLIP3-KALE, a dataset of 218 million Image-Text Pairs that bridges the gap between descriptive synthetic captions and factual web-scale alt-text. KALE augments synthetic dense image captions with web-scale alt-text to generate factually grounded image captions. Our two-sta