Large-scale datasets for natural language inference are created by presenting
crowd workers with a sentence (premise), and asking them to generate three new
sentences (hypotheses) that it entails, contradicts, or is logically neutral
with respect to. We show that, in a significant port