We propose an unsupervised, mid-level representation for a generative model
of scenes. The representation is mid-level in that it is neither per-pixel nor
per-image; rather, scenes are modeled as a collection of spatial, depth-ordered
"blobs" of features. Blobs are differentiably place