variational auto-encoders (VAEs) are capable of learning latent
representations for high dimensional data. However, due to the i.i.d.
assumption, VAEs only optimize the singleton variational distributions and fail
to account for the correlations between data points, which might be cruc