State-of-the-art deep reading comprehension models are dominated by recurrent
neural nets. Their sequential nature is a natural fit for language, but it also
precludes parallelization within an instances and often becomes the bottleneck
for deploying such models to latency critical sce