Recent research (arXiv:2310.11453, arXiv:2402.17764) has proposed binary and
ternary transformer networks as a way to significantly reduce memory and
improve inference speed in Large Language Models (LLMs) while maintaining
accuracy. In this work, we apply techniques from mechanistic interpre