Offline handwriting recognition systems require cropped text line images for both training and recognition. On the one hand, the annotation of position and transcript at line level is costly to obtain. On the other hand, automatic line segmentation algorithms are prone to errors, compromising the subsequent recognition. In this paper, we propose a modification of the popular and efficient multi-dimensional long short-term memory recurrent neural networks (MDLSTM-RNNs) to enable end-to-end processing of handwritten paragraphs. More particularly, we replace the collapse layer transforming the two-dimensional representation into a sequence of predictions by a recurrent version which can recognize one line at a time. In the proposed model, a neural network performs a kind of implicit line segmentation by computing attention weights on the image representation. The experiments on paragraphs of Rimes and IAM database yield results that are competitive with those of networks trained at line level, and constitute a significant step towards end-to-end transcription of full documents.

本文提出了一种针对离线手写段落文本识别的神经网络模型，利用注意力权重计算图像表示，实现了一种隐式行分割的可训练端到端模型，试验结果表明其性能与传统基于线段的模型竞争力相当，具有将全文档转录实现的潜力。

端到端手写段落识别的联合分割和转录