Despite the unprecedented success of text-to-image diffusion models,
controlling the number of depicted objects using text is surprisingly hard.
This is important for various applications from technical documents, to
children's books to illustrating cooking recipes. Generating object-c