We address the challenge of acquiring real-world manipulation skills with a
scalable framework.Inspired by the success of large-scale auto-regressive
prediction in Large Language Models (LLMs), we hold the belief that identifying
an appropriate prediction target capable of leveraging l