In this paper, we propose a novel architecture called Composition Attention
Grammars (CAGs) that recursively compose subtrees into a single vector
representation with a composition function, and selectively attend to previous
structural information with a self-attention mechanism. We i