Implicit neural representation has recently shown a promising ability in representing images with arbitrary resolutions. In this paper, we present a Local Implicit Transformer (LIT), which integrates the attention mechanism and frequency encoding technique into a local implicit image function. We design a cross-scale local attention block to effectively aggregate local features. To further improve representative power, we propose a Cascaded LIT (CLIT) that exploits multi-scale features, along with a cumulative training strategy that gradually increases the upsampling scales during training. We have conducted extensive experiments to validate the effectiveness of these components and analyze various training strategies. The qualitative and quantitative results demonstrate that LIT and CLIT achieve favorable results and outperform the prior works in arbitrary super-resolution tasks.

本文介绍了一种 Local Implicit Transformer (LIT) 方法，它将注意机制和频率编码技术与本地隐式图像函数相结合，设计了一个跨尺度的局部注意块来有效聚合局部特征，进一步提高了代表性能力，并提出了一种级联的 LIT (CLIT) 方法，利用多尺度特征和渐进式训练策略，在任意超分辨率任务中实现了较好的结果，并胜过了以前的方法。

任意尺度超分辨率的级联局部隐式变换器