Deep Recurrent Neural Network architectures, though remarkably capable at
modeling sequences, lack an intuitive high-level spatio-temporal structure.
That is while many problems in computer vision inherently have an underlying
high-level structure and can benefit from it. spatio-temporal grap