The standard recipe applied in transfer learning is to finetune a pretrained model on the task-specific dataset with different hyperparameter settings and pick the model with the highest accuracy on the validation dataset. Unfortunately, this leads to models which do not perform well under distribution shifts, e.g. when the model is given graphical sketches of the object as input instead of photos. In order to address this, we propose the manifold mixing model soup, an algorithm which mixes together the latent space manifolds of multiple finetuned models in an optimal way in order to generate a fused model. We show that the fused model gives significantly better out-of-distribution performance (+3.5 % compared to best individual model) when finetuning a CLIP model for image classification. In addition, it provides also better accuracy on the original dataset where the finetuning has been done.

我们提出了混合模型汤的流形混合模型算法，通过以最佳方式混合多个微调模型的潜在空间流形来生成融合模型，该融合模型在分布转移时表现显著提高（与最佳单个模型相比提高 3.5%），并且在微调所用的原始数据集上也提供更高的准确率。

弗兰肯斯坦效应，或者如何通过混合流形模型实现更好的超出分布性能