While sorting is an important procedure in computer science, the argsort
operator - which takes as input a vector and returns its sorting permutation -
has a discrete image and thus zero gradients almost everywhere. This prohibits
end-to-end, gradient-based learning of models that rely