While modern large-scale datasets often consist of heterogeneous
subpopulations -- for example, multiple demographic groups or multiple text
corpora -- the standard practice of minimizing average loss fails to guarantee
uniformly low losses across all subpopulations. We propose a conve