Traditional syntax models typically leverage part-of-speech (POS) information
by constructing features from hand-tuned templates. We demonstrate that a
better approach is to utilize pos tags as a regularizer of learned
representations. We propose a simple method for learning a stacked