Protein language models have revolutionized structure prediction, but their nonlinear nature obscures how sequence representations inform structure prediction. While sparse autoencoders (SAEs) offer a path to interpretability here by learning linear representations in high-dimensional space, their application has been limited to smaller protein language models unable to perform structure prediction. In this work, we make two key advances: (1) we scale SAEs to ESM2-3B, the base model for ESMFold, enabling mechanistic interpretability of protein structure prediction for the first time, and (2) we adapt Matryoshka SAEs for protein language models, which learn hierarchically organized features by forcing nested groups of latents to reconstruct inputs independently. We demonstrate that our Matryoshka SAEs achieve comparable or better performance than standard architectures. Through comprehensive evaluations, we show that SAEs trained on ESM2-3B significantly outperform those trained on smaller models for both biological concept discovery and contact map prediction. Finally, we present an initial case study demonstrating how our approach enables targeted steering of ESMFold predictions, increasing structure solvent accessibility while fixing the input sequence. To facilitate further investigation by the broader community, we open-source our code, dataset, pretrained models https://github.com/johnyang101/reticular-sae , and visualizer https://sae.reticular.ai .

这项研究解决了蛋白质结构预测中序列表示如何影响结构预测的可解释性问题。通过扩展稀疏自编码器（SAEs）至大型蛋白质语言模型ESM2-3B，并采用层次化组织特征的Matryoshka SAEs，我们实现了前所未有的机制可解释性，支持对结构预测的精确调控。研究结果显示，SAEs在生物概念发现和接触图预测上的表现显著优于较小模型训练的SAEs，具有重要的应用潜力。

面向可解释的蛋白质结构预测：稀疏自编码器的应用