Recently, transformer-based architecture has been introduced into single
image deraining task due to its advantage in modeling non-local information.
However, existing approaches tend to integrate global features based on a dense
self-attention strategy since it tend to uses all simila