Recent work has aimed to capture nuances of human behavior by using llms to simulate responses from particular demographics in settings like social science experiments and public opinion surveys. However, there are currently no established ways to discuss or evaluate the quality of suc