An emerging line of work has sought to generate plausible imagery from touch.
Existing approaches, however, tackle only narrow aspects of the visuo-tactile
synthesis problem, and lag significantly behind the quality of cross-modal
synthesis methods in other domains. We draw on recent advances in latent
diffusion to create a model for synthesizing images from