For an image with multiple scene texts, different people may be interested in
different text information. Current text-aware image captioning models are not
able to generate distinctive captions according to various information needs.
To explore how to generate personalized text-aware