Many successful deep learning architectures are equivariant to certain
transformations in order to conserve parameters and improve generalization:
most famously, convolution layers are equivariant to shifts of the input. This
approach only works when practitioners know the symmetries o