talking face generation has gained immense popularity in the computer vision
community, with various applications including AR/VR, teleconferencing, digital
assistants, and avatars. Traditional methods are mainly audio-driven ones which
have to deal with the inevitable resource-intensi