This paper considers the problem of model selection under domain shift. In this setting, it is proposed that a high maximum mean discrepancy (MMD) between the training and validation sets increases the generalisability of selected models. A data splitting algorithm based on kernel k-means clustering, which maximises this objective, is presented. The algorithm leverages linear programming to control the size, label, and (optionally) group distributions of the splits, and comes with convergence guarantees. The technique consistently outperforms alternative splitting strategies across a range of datasets and training algorithms, for both domain generalisation (DG) and unsupervised domain adaptation (UDA) tasks. Analysis also shows the MMD between the training and validation sets to be strongly rank-correlated ($\rho=0.63$) with test domain accuracy, further substantiating the validity of this approach.

该论文考虑了域漂移下的模型选择问题，并提出了一种基于核k-means聚类的数据分割算法，该算法最大化训练集和验证集之间的最大平均差异(MMD)，提高选定模型的泛化能力，该技术在一系列数据集和训练算法中一直表现优于其他分割策略，适用于域广义化和无监督域适应任务。分析还表明，训练集和验证集之间的MMD与测试域准确性强烈相关（$\rho=0.63$），进一步证实了这种方法的有效性。

基于聚类的领域泛化验证划分