To build robust question answering systems, we need the ability to verify
whether answers to questions are truly correct, not just "good enough" in the
context of imperfect QA datasets. We explore the use of natural language
inference (NLI) as a way to achieve this goal, as NLI inheren