Gradient-based meta-learners such as MAML are able to learn a meta-prior from
similar tasks to adapt to novel tasks from the same distribution with few
gradient updates. One important limitation of such frameworks is that they seek
a common initialization shared across the entire task distribution,
substantially limiting the diversity of the task distributio