self-supervised learning (SSL) models use only the intrinsic structure of a
given signal, independent of its acoustic domain, to extract essential
information from the input to an embedding space. This implies that the utility
of such representations is not limited to modeling human sp