crawling parallel texts $\unicode{x2014}$texts that are mutual
translations$\unicode{x2014}$ from the Internet is usually done following a
brute-force approach: documents are massively downloaded in an unguided
process, and only a fraction of them end up leading to actual parallel cont