For a given base class of sequence-to-next-token generators, we consider learning prompt-to-answer mappings obtained by iterating a fixed, time-invariant generator for multiple steps, thus generating a chain-of-thought, and then taking the final token as the answer. We formalize the learning problems both when the chain-of-thought is observed and when training only on prompt-answer pairs, with the chain-of-thought latent. We analyze the sample and computational complexity both in terms of general properties of the base class (e.g. its VC dimension) and for specific base classes such as linear thresholds. We present a simple base class that allows for universal representability and computationally tractable chain-of-thought learning. Central to our development is that time invariance allows for sample complexity that is independent of the length of the chain-of-thought. Attention arises naturally in our construction.

本研究解决了在给定序列到下一个标记生成器的基础类中，如何学习提示到答案映射的问题，特别是通过多步迭代固定的生成器生成思维链。我们提出了一种简单的基础类，该类支持普适性表达和可计算的思维链学习，关键发现是时间不变性使得样本复杂度与思维链长度无关。

自回归思维链的学习理论