self-attentive transformer models have recently been shown to solve the next
item recommendation task very efficiently. The learned attention weights
capture sequential dynamics in user behavior and generalize well. Motivated by
the special structure of learned parameter space, we ques