We propose a novel method to use both audio and a low-resolution image to
perform extreme face super-resolution (a 16x increase of the input size). When
the resolution of the input image is very low (e.g., 8x8 pixels), the loss of
information is so dire that important details of the or