neural text classification models typically treat output labels as
categorical variables which lack description and semantics. This forces their
parametrization to be dependent on the label set size, and, hence, they are
unable to scale to large label sets and generalize to unseen ones