We introduce a large dataset of narrative texts and questions about these texts, intended to be used in a machine comprehension task that requires reasoning using commonsense knowledge. Our dataset complements similar datasets in that we focus on stories about everyday activities, such as going to the movies or working in the garden, and that the questions require commonsense knowledge, or more specifically, script knowledge, to be answered. We show that our mode of data collection via crowdsourcing results in a substantial amount of such inference questions. The dataset forms the basis of a shared task on commonsense and script knowledge organized at SemEval 2018 and provides challenging test cases for the broader natural language understanding community.

该研究介绍了一种大规模的、关于叙事文本及其相关问题的数据集，用于进行需要运用常识以及剧本知识推理的机器理解任务，该数据集与现有的类似数据集的区别在于，它侧重于关于日常活动的故事，并且其问题需要常识知识或者更具体地说是剧本知识来回答。通过众包策略收集数据，该数据集提供了实际推理方面的大量问题，并被用于SemEval 2018 中的关于常识和剧本知识的共享任务，并为更广泛的自然语言理解社区提供了具有挑战性的测试用例。

MCScript：一种基于剧本知识评估机器理解能力的新数据集