BriefGPT.xyz
Dec, 2023
压缩与对齐:用人类知识筛选图像文本数据
Compress & Align: Curating Image-Text Data with Human Knowledge
HTML
PDF
Lei Zhang, Fangxun Shu, Sucheng Ren, Bingchen Zhao, Hao Jiang...
TL;DR
该研究通过采用人工智能算法对图像文本数据进行高质量压缩,并利用训练出的奖励模型作为人类般的裁判来过滤不对齐/低质量的图像文本对。
Abstract
The massive growth of
image-text data
through web crawling inherently presents the challenge of variability in data quality. This paper introduces a novel algorithm, rooted in human knowledge, to
compress
this va
→