Automated question generation is an important approach to enable personalisation of English comprehension assessment. Recently, transformer-based pretrained language models have demonstrated the ability to produce appropriate questions from a context paragraph. Typically, these systems are evaluated against a reference set of manually generated questions using n-gram based metrics, or manual qualitative assessment. Here, we focus on a fully automated multiple-choice question generation (MCQG) system where both the question and possible answers must be generated from the context paragraph. Applying n-gram based approaches is challenging for this form of system as the reference set is unlikely to capture the full range of possible questions and answer options. Conversely manual assessment scales poorly and is expensive for MCQG system development. In this work, we propose a set of performance criteria that assess different aspects of the generated multiple-choice questions of interest. These qualities include: grammatical correctness, answerability, diversity and complexity. Initial systems for each of these metrics are described, and individually evaluated on standard multiple-choice reading comprehension corpora.

本文提出一种多选题智能生成系统，利用预训练的语言模型，根据规定的性能指标，通过读取理解语料库来生成语法准确，具有可回答性、多样性和复杂性的问题及选项。

多项选择题生成：面向自动化评估框架的发展