Capturing long-range dependencies in feature representations is crucial for
many visual recognition tasks. Despite recent successes of deep convolutional
networks, it remains challenging to model non-local context relations between
visual features. A promising strategy is to model the