This work digs into a root question in human perception: can face geometry be
gleaned from one's voices? Previous works that study this question only adopt
developments in image synthesis and convert voices into face images to show
correlations, but working on the image domain unavoidably involves predicting
attributes that voices cannot hint, including faci