To the best of our knowledge, we first present a live system that generates
personalized photorealistic talking-head animation only driven by audio signals
at over 30 fps. Our system contains three stages. The first stage is a deep
neural network that extracts deep audio features along