attention networks have proven to be an effective approach for embedding
categorical inference within a deep neural network. However, for many tasks we
may want to model richer structural dependencies without abandoning end-to-end
training. In this work, we experiment with incorporatin