As human-robot interaction (HRI) systems advance, so does the difficulty of
evaluating and understanding the strengths and limitations of these systems in
different environments and with different users. To this end, previous methods
have algorithmically generated diverse scenarios tha