Multiple data types naturally co-occur when describing real-world phenomena
and learning from them is a long-standing goal in machine learning research.
However, existing self-supervised generative models approximating an ELBO are
not able to fulfill all desired requirements of multimo