BriefGPT.xyz
Apr, 2024
HRVDA:高分辨率视觉文档助手
HRVDA: High-Resolution Visual Document Assistant
HTML
PDF
Chaohu Liu, Kun Yin, Haoyu Cao, Xinghua Jiang, Xin Li...
TL;DR
利用大量的训练数据,本文提出了一种高分辨率可视文件助手(HRVDA),该模型利用内容过滤机制和指令过滤模块分别过滤不确定内容和指令的可视标记,从而在高分辨率图像的模型训练和推理方面取得高效的性能,同时在多个文档理解数据集上实现了最先进的性能。
Abstract
Leveraging vast training data,
multimodal large language models
(MLLMs) have demonstrated formidable general visual comprehension capabilities and achieved remarkable performance across various tasks. However, their performance in
→