Nov, 2023

多模态上下文学习使自适应场景文本识别器

TL;DRScene text recognition (STR) in the wild faces challenges due to domain variations, font diversity, shape deformations, etc. Recent studies show that large language models (LLMs) can learn from a few demonstration examples using In-Context Learning (ICL). However, applying LLMs as a text recognizer is resource-consuming. To address this, the paper introduces E$^2$STR, a STR model trained with context-rich scene text sequences, demonstrating effective ICL capabilities with a regular-sized model and outperforming fine-tuned approaches.