variational autoencoders (VAEs), as well as other generative models, have
been shown to be efficient and accurate for capturing the latent structure of
vast amounts of complex high-dimensional data. However, existing VAEs can still
not directly handle data that are heterogenous (mixed