Models trained on data composed of different groups or domains can suffer
from severe performance degradation under distribution shifts. While recent
methods have largely focused on optimizing the worst-group objective, this
often comes at the expense of good performance on other group