While modern text-to-speech (TTS) systems can produce speech rated highly in
terms of subjective evaluation, the distance between real and synthetic speech
distributions remains understudied, where we use the term \textit{distribution}
to mean the sample space of all possible real spee