TL;DR通过对 DNA 序列的新叠词标记方法和 RandomMask 技术进行预训练,提高了生命科学领域的下游任务性能。
Abstract
With the success of large-scale pretraining in NLP, there is an increasing
trend of applying it to the domain of life sciences. In particular, pretraining
methods based on dna sequences have garnered growing attention due to their
potential to capture generic information about genes. H