Many tasks aim to measure machine reading comprehension (MRC), often focusing
on question types presumed to be difficult. Rarely, however, do task designers
start by considering what systems should in fact comprehend. In this paper we
make two key contributions. First, we argue that ex