The ability to map descriptions of scenes to 3D geometric representations has many applications in areas such as art, education, and robotics. However, prior work on the text to 3d scene generation task has used manually specified object categories and language that identifies them. We