VisualWordGrid: 多模态途径提取扫描文档信息

Oct, 2020

VisualWordGrid: 多模态途径提取扫描文档信息

VisualWordGrid: Information Extraction From Scanned Documents Using A Multimodal Approach

Mohamed Kerroumi, Othmane Sayem, Aymen Shabou

TL;DR介绍了一种用于扫描文档表示的新方法，它可以同时编码文本、视觉和布局信息，以3轴张量形式用作分割模型的输入，并在考虑到视觉形式的基础上，提高了对小数据集的鲁棒性同时保持推理时间的速度，经过公共和私人文档图像数据集的测试，与最近的最先进方法相比，表现出更高的性能。

Abstract

We introduce a novel approach for scanned document representation to perform fields extraction task. It allows the simultaneous encoding of the textual, visual and layout information in a 3D matrix used as an inp