Susie Xi Rao, Johannes Rausch, Peter Egger, Ce Zhang
TL;DR本文介绍了 TableParser 系统,该系统能够高精度地解析原生 PDF 和扫描图像中的表格结构,并提供了一种基于电子表格的弱监督机制和具有启用表格解析功能的管道,以促进进一步的研究方向。
Abstract
Tables have been an ever-existing structure to store data. There exist now different approaches to store tabular data physically. pdfs, images, spreadsheets, and CSVs are leading examples. Being able to parse table structures and extract content bounded by these structures is of high i