The Information Bottleneck of Chain-of-Thought and How Latent CoT Overcomes It

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 5.6 Download Report PDF

Chain-of-ThoughtLatent CoTLarge language model

Chain-of-thought (CoT) has become the de facto paradigm for large language models (LLMs) to solve complex reasoning tasks. However, due to the sequential nature of token generation, the inference time can be formidable if the CoT is exceedingly long. This paper identifies a fundamental \emph{information bottleneck} that can cause the CoT to be long: although each forward pass can activate a vast amount of neurons, in the end, the information the model writes down is limited to a single token, making it inevitable to produce many more CoT steps than necessary. We first theoretically establish this bottleneck by showing that for some natural problems, such as pointer chasing and computing parity, either 1-layer transformers or constant-layer finite-precision transformers require a rather long CoT to solve. We then demonstrate that for these same problems, allowing the Transformer to write high-dimensional embeddings to the CoT (i.e., using latent CoT) significantly reduces the CoT length, establishing a provably theoretical benefit for using latent CoT. We further validate our theory with controlled experiments: training a small transformer to simulate Conway’s Game of Life with latent CoT, we vary the per-step write bandwidth to the latent CoT and observe a sharp success threshold proportional to the board size.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper establishes theoretical lower bounds on chain-of-thought length for token-based reasoning and demonstrates that latent representations can provably reduce this length. It resides in the 'Information Bottleneck and Capacity Analysis' leaf under 'Theoretical Foundations and Analysis', where it is currently the sole occupant among 50 papers across the taxonomy. This positioning reflects a sparse theoretical niche within a field dominated by empirical compression methods and architectural innovations, suggesting the work addresses a relatively underexplored formal angle on reasoning efficiency.

The taxonomy reveals that most related work clusters in empirical branches: 'Compression of Explicit Chain-of-Thought' contains six papers on distillation and steering, 'Latent Space Reasoning Frameworks' holds six papers on continuous embedding prediction and implicit reasoning, and 'Hybrid Token-Latent Reasoning Architectures' includes seven papers on contemplation tokens and adaptive computation. The paper's theoretical sibling leaves—'Learning Theory for CoT Reasoning' and 'Architectural Depth and Looping'—each contain one paper, indicating that formal analysis of reasoning capacity remains less developed than method design. The scope and exclude notes clarify that this work focuses on information constraints rather than learning complexity or architectural depth.

Among 26 candidates examined, the analysis found limited prior overlap. Contribution A (information bottleneck identification) examined 10 candidates with 1 potential refutation; Contribution B (theoretical lower bounds) examined 6 candidates with 1 potential refutation; Contribution C (latent CoT benefits) examined 10 candidates with 0 refutations. The small number of refutable candidates suggests that within this limited search scope, the theoretical framing and formal bounds appear relatively novel. However, the search examined only top-K semantic matches and citations, not the full theoretical computer science or complexity theory literature, so stronger prior results may exist outside this scope.

Given the sparse theoretical branch and limited refutations among 26 examined candidates, the work appears to occupy a distinct formal niche. The analysis does not cover exhaustive complexity theory literature or all transformer expressiveness studies, so the novelty assessment reflects only the surveyed reasoning-focused papers. The theoretical contributions seem less crowded than the empirical compression and latent reasoning methods that dominate neighboring taxonomy branches.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Reducing chain-of-thought length through latent reasoning representations. The field addresses the computational and efficiency challenges of explicit chain-of-thought (CoT) reasoning by exploring how models can perform multi-step inference in compressed or implicit forms. The taxonomy reveals several complementary directions: Latent Space Reasoning Frameworks develop architectures that operate entirely in hidden representations (e.g., Hidden CoT Decoding[5], Reasoning in Dark[41]); Compression of Explicit Chain-of-Thought focuses on distilling verbose reasoning into shorter token sequences (Compressed CoT[1], Concise CoT Distillation[49]); Hybrid Token-Latent Reasoning Architectures blend explicit and implicit steps (CODI[2], System-1.5 Reasoning[20]); Reinforcement Learning for Latent Reasoning Optimization tunes latent policies to balance efficiency and accuracy (Hybrid Latent RL[36]); Multimodal Latent Reasoning extends these ideas to vision and other modalities (Latent CoT Driving[23], Dynamic Multimodal Interleaving[15]); Theoretical Foundations and Analysis examines capacity, information flow, and expressiveness (Information Bottleneck CoT[0], Autoregressive CoT Theory[40]); and Meta-Level Reasoning Control dynamically adjusts reasoning depth (ReflCtrl[24], Adaptive Latent Reasoning[34]). Surveys and broad overviews (Latent Reasoning Survey[3], Large Reasoning Models Survey[16]) synthesize these threads, while peripheral branches capture tangentially related work. A central tension runs through many branches: how much reasoning can be internalized without sacrificing interpretability or performance. Works like Hidden Thinking[8] and Think Silently Fast[21] push toward fully opaque latent computation, while Compressed CoT[1] and Self-Training Concise Reasoning[13] retain token-level traces in condensed form. Information Bottleneck CoT[0] sits within the Theoretical Foundations and Analysis branch, specifically under Information Bottleneck and Capacity Analysis, where it formalizes the trade-off between reasoning compression and task accuracy using information-theoretic principles. This theoretical lens complements empirical compression methods (Compressed CoT[1]) and architectural innovations (CODI[2]) by providing rigorous bounds on what latent representations can achieve. Compared to neighboring theoretical work like Autoregressive CoT Theory[40] or Latency-Response Theory[38], Information Bottleneck CoT[0] emphasizes capacity constraints and optimal encoding, offering a principled framework for understanding when and why latent reasoning succeeds or fails.

Claimed Contributions

Identification of the information bottleneck in chain-of-thought reasoning

Can Refute

10 retrieved papers

The authors identify that token-based chain-of-thought reasoning suffers from an information bottleneck where each forward pass can only write O(log |V|) bits of information (a single token) to the transcript, forcing models to use many more CoT steps than necessary despite having high-dimensional internal representations.

10 retrieved papers

Can Refute

Theoretical lower bounds for token CoT on pointer chasing and parity

Can Refute

6 retrieved papers

The authors prove that single-layer transformers need Ω(n/d) CoT steps for pointer chasing and constant-layer finite-precision transformers need Ω(n/polylog(n)) steps for parity, establishing fundamental limitations of token-based CoT that hold regardless of model dimension or computational cost per step.

6 retrieved papers

Can Refute

Demonstration that latent CoT overcomes the information bottleneck

10 retrieved papers

The authors prove that latent CoT (where models write high-dimensional embeddings instead of single tokens) reduces CoT length by roughly a factor of d for both pointer chasing and parity problems, demonstrating that the bottleneck is informational rather than computational and can be overcome by increasing write bandwidth.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Identification of the information bottleneck in chain-of-thought reasoning

[62] On Limitations of the Transformer Architecture PDF

Can Refute

[63] Towards revealing the mystery behind chain of thought: a theoretical perspective PDF

Cannot Refute

[64] Learning to maximize mutual information for chain-of-thought distillation PDF

Cannot Refute

[65] Knowledge circuits in pretrained transformers PDF

Cannot Refute

[66] Embodiedgpt: Vision-language pre-training via embodied chain of thought PDF

Cannot Refute

[67] Back attention: Understanding and enhancing multi-hop reasoning in large language models PDF

Cannot Refute

[68] LLM-IFT: LLM-Powered Information Flow Tracking for Secure Hardware PDF

Cannot Refute

[69] From redundancy to relevance: Information flow in lvlms across reasoning tasks PDF

Cannot Refute

[70] Offline Reinforcement Learning for LLM Multi-Step Reasoning PDF

Cannot Refute

[71] Implicit reasoning in transformers is reasoning through shortcuts PDF

Cannot Refute

Contribution

Theoretical lower bounds for token CoT on pointer chasing and parity

[58] Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers PDF

Can Refute

[56] Circuit complexity bounds for rope-based transformer architecture PDF

Cannot Refute

[57] From sparse dependence to sparse attention: unveiling how chain-of-thought enhances transformer sample efficiency PDF

Cannot Refute

[59] Overcoming a Theoretical Limitation of Self-Attention PDF

Cannot Refute

[60] How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias PDF

Cannot Refute

[61] Transformers Provably Solve Parity Efficiently with Chain of Thought PDF

Cannot Refute

Contribution

Demonstration that latent CoT overcomes the information bottleneck

[1] Compressed Chain of Thought: Efficient Reasoning Through Dense Representations PDF

Cannot Refute

[3] A survey on latent reasoning PDF

Cannot Refute

[9] Reasoning with latent thoughts: On the power of looped transformers PDF

Cannot Refute

[16] Efficient inference for large reasoning models: A survey PDF

Cannot Refute

[18] LaRS: Latent Reasoning Skills for Chain-of-Thought Reasoning PDF

Cannot Refute

[51] Training large language models to reason in a continuous latent space PDF

Cannot Refute

[52] LARES: Latent Reasoning for Sequential Recommendation PDF

Cannot Refute

[53] Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning PDF

Cannot Refute

[54] Token assorted: Mixing latent and text tokens for improved language model reasoning PDF

Cannot Refute

[55] Ladir: Latent diffusion enhances llms for text reasoning PDF

Cannot Refute

The Information Bottleneck of Chain-of-Thought and How Latent CoT Overcomes It

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

Identification of the information bottleneck in chain-of-thought reasoning

[62] On Limitations of the Transformer Architecture PDF

[63] Towards revealing the mystery behind chain of thought: a theoretical perspective PDF

[64] Learning to maximize mutual information for chain-of-thought distillation PDF

[65] Knowledge circuits in pretrained transformers PDF

[66] Embodiedgpt: Vision-language pre-training via embodied chain of thought PDF

[67] Back attention: Understanding and enhancing multi-hop reasoning in large language models PDF

[68] LLM-IFT: LLM-Powered Information Flow Tracking for Secure Hardware PDF

[69] From redundancy to relevance: Information flow in lvlms across reasoning tasks PDF

[70] Offline Reinforcement Learning for LLM Multi-Step Reasoning PDF

[71] Implicit reasoning in transformers is reasoning through shortcuts PDF

Theoretical lower bounds for token CoT on pointer chasing and parity

[58] Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers PDF

[56] Circuit complexity bounds for rope-based transformer architecture PDF

[57] From sparse dependence to sparse attention: unveiling how chain-of-thought enhances transformer sample efficiency PDF

[59] Overcoming a Theoretical Limitation of Self-Attention PDF

[60] How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias PDF

[61] Transformers Provably Solve Parity Efficiently with Chain of Thought PDF

Demonstration that latent CoT overcomes the information bottleneck

[1] Compressed Chain of Thought: Efficient Reasoning Through Dense Representations PDF

[3] A survey on latent reasoning PDF

[9] Reasoning with latent thoughts: On the power of looped transformers PDF

[16] Efficient inference for large reasoning models: A survey PDF

[18] LaRS: Latent Reasoning Skills for Chain-of-Thought Reasoning PDF

[51] Training large language models to reason in a continuous latent space PDF

[52] LARES: Latent Reasoning for Sequential Recommendation PDF

[53] Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning PDF

[54] Token assorted: Mixing latent and text tokens for improved language model reasoning PDF

[55] Ladir: Latent diffusion enhances llms for text reasoning PDF

Table of Contents