Text-to-image diffusion models have shown remarkable success in generating a personalized subject based on a few reference images. However, current methods struggle with handling multiple subjects simultaneously, often resulting in mixed identities with combined attributes from different subjects. In this work, we present MuDI, a novel framework that enables multi-subject personalization by effectively decoupling identities from multiple subjects. Our main idea is to utilize segmented subjects generated by the Segment Anything Model for both training and inference, as a form of data augmentation for training and initialization for the generation process. Our experiments demonstrate that MuDI can produce high-quality personalized images without identity mixing, even for highly similar subjects as shown in Figure 1. In human evaluation, MuDI shows twice as many successes for personalizing multiple subjects without identity mixing over existing baselines and is preferred over 70% compared to the strongest baseline. More results are available at https://mudi-t2i.github.io/.

借助Segment Anything模型生成分割主题，我们提出了一个名为MuDI的新框架，实现了多主题个性化图像生成，避免了不同主题间的混合属性，实验证明MuDI能够在人评测试中产生高质量的个性化图像，并且相较强基线有两倍的成功率和超过70％的偏好度。

文本生成图像模型的多主体个性化中的身份解耦