Dynamically synthesizing talking speech that actively responds to a listening
head is critical during the face-to-face interaction. For example, the speaker
could take advantage of the listener's facial expression to adjust the tones,
stressed syllables, or pauses. In this work, we pre