structured representations, such as Bags of Words, VLAD and Fisher Vectors,
have proven highly effective to tackle complex visual recognition tasks. As
such, they have recently been incorporated into deep architectures. However,
while effective, the resulting deep structured representa