Current vision-language models (VLMs) show exceptional abilities across diverse tasks, such as visual question answering. To enhance User Experience, recent studies investigate VLM personalization to understand user-provided concepts. However, they mainly focus on single-concept person