BriefGPT.xyz
Oct, 2022
基于视觉丰富的文档提取模型数据标注成本的显著降低
Radically Lower Data-Labeling Costs for Visually Rich Document Extraction Models
HTML
PDF
Yichao Zhou, James B. Wendt, Navneet Potti, Jing Xie, Sandeep Tata
TL;DR
提出使用选择性标注结合主动学习的方法,以简化对可预测提取的样本进行标注的成本,实验证明相比全额标注,该方法可将成本降低10倍同时精度不受影响,并且适用于不同领域的文档。
Abstract
A key bottleneck in building
automatic extraction
models for visually rich documents like
invoices
is the cost of acquiring the several thousand high-quality labeled documents that are needed to train a model wit
→