In this paper, we investigate the tendency of end-to-end neural Machine
Reading Comprehension (MRC) models to match shallow patterns rather than
perform inference-oriented reasoning on RC benchmarks. We aim to test the
ability of these systems to answer questions which focus on referen