This paper establishes a theoretical foundation for understanding the fundamental limits of AI explainability through algorithmic information theory. We formalize explainability as the approximation of complex models by simpler ones, quantifying both approximation error and explanation complexity using Kolmogorov complexity. Our key theoretical contributions include: (1) a complexity gap theorem proving that any explanation significantly simpler than the original model must differ from it on some inputs; (2) precise bounds showing that explanation complexity grows exponentially with input dimension but polynomially with error tolerance for Lipschitz functions; and (3) a characterization of the gap between local and global explainability, demonstrating that local explanations can be significantly simpler while maintaining accuracy in relevant regions. We further establish a regulatory impossibility theorem proving that no governance framework can simultaneously pursue unrestricted AI capabilities, human-interpretable explanations, and negligible error. These results highlight considerations likely to be relevant to the design, evaluation, and oversight of explainable AI systems.

本研究通过算法信息理论建立了理解人工智能可解释性基本极限的理论基础。我们将可解释性形式化为复杂模型与简单模型的近似，并量化近似误差和解释复杂性。研究结果显示，任何显著简化的解释必将在某些输入上与原模型不同，且解释复杂性随输入维度呈指数增长，这些发现对可解释人工智能系统的设计和监管具有重要影响。

人工智能可解释性的极限：一种算法信息理论方法