BriefGPT.xyz
Mar, 2014
直接从行程长度压缩的印刷文本文档中提取线字符段
Extraction of Line Word Character Segments Directly from Run Length Compressed Printed Text Documents
HTML
PDF
Mohammed Javed, P. Nagabhushan, B. B. Chaudhuri
TL;DR
本文研究了OCR预处理阶段中,虽然大多数实际文档都是以压缩形式存在,但传统方法仍在未压缩文档上进行分割处理的问题,并通过在基于跑长度压缩的文本文档上提出线段、字词和字符级别的分割方法来解决该问题。
Abstract
segmentation
of a text-document into lines, words and characters, which is considered to be the crucial pre-processing stage in
optical character recognition
(OCR) is traditionally carried out on uncompressed doc
→