music generation has generally been focused on either creating scores or
interpreting them. We discuss differences between these two problems and
propose that, in fact, it may be valuable to work in the space of direct $\it
performance$ generation: jointly predicting the notes $\it and