The Information Bottleneck of Chain-of-Thought and How Latent CoT Overcomes It

ICLR 2026 Conference SubmissionAnonymous Authors
Chain-of-ThoughtLatent CoTLarge language model
Abstract:

Chain-of-thought (CoT) has become the de facto paradigm for large language models (LLMs) to solve complex reasoning tasks. However, due to the sequential nature of token generation, the inference time can be formidable if the CoT is exceedingly long. This paper identifies a fundamental \emph{information bottleneck} that can cause the CoT to be long: although each forward pass can activate a vast amount of neurons, in the end, the information the model writes down is limited to a single token, making it inevitable to produce many more CoT steps than necessary. We first theoretically establish this bottleneck by showing that for some natural problems, such as pointer chasing and computing parity, either 1-layer transformers or constant-layer finite-precision transformers require a rather long CoT to solve. We then demonstrate that for these same problems, allowing the Transformer to write high-dimensional embeddings to the CoT (i.e., using latent CoT) significantly reduces the CoT length, establishing a provably theoretical benefit for using latent CoT. We further validate our theory with controlled experiments: training a small transformer to simulate Conway’s Game of Life with latent CoT, we vary the per-step write bandwidth to the latent CoT and observe a sharp success threshold proportional to the board size.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper establishes theoretical lower bounds on chain-of-thought length for token-based reasoning and demonstrates that latent representations can provably reduce this length. It resides in the 'Information Bottleneck and Capacity Analysis' leaf under 'Theoretical Foundations and Analysis', where it is currently the sole occupant among 50 papers across the taxonomy. This positioning reflects a sparse theoretical niche within a field dominated by empirical compression methods and architectural innovations, suggesting the work addresses a relatively underexplored formal angle on reasoning efficiency.

The taxonomy reveals that most related work clusters in empirical branches: 'Compression of Explicit Chain-of-Thought' contains six papers on distillation and steering, 'Latent Space Reasoning Frameworks' holds six papers on continuous embedding prediction and implicit reasoning, and 'Hybrid Token-Latent Reasoning Architectures' includes seven papers on contemplation tokens and adaptive computation. The paper's theoretical sibling leaves—'Learning Theory for CoT Reasoning' and 'Architectural Depth and Looping'—each contain one paper, indicating that formal analysis of reasoning capacity remains less developed than method design. The scope and exclude notes clarify that this work focuses on information constraints rather than learning complexity or architectural depth.

Among 26 candidates examined, the analysis found limited prior overlap. Contribution A (information bottleneck identification) examined 10 candidates with 1 potential refutation; Contribution B (theoretical lower bounds) examined 6 candidates with 1 potential refutation; Contribution C (latent CoT benefits) examined 10 candidates with 0 refutations. The small number of refutable candidates suggests that within this limited search scope, the theoretical framing and formal bounds appear relatively novel. However, the search examined only top-K semantic matches and citations, not the full theoretical computer science or complexity theory literature, so stronger prior results may exist outside this scope.

Given the sparse theoretical branch and limited refutations among 26 examined candidates, the work appears to occupy a distinct formal niche. The analysis does not cover exhaustive complexity theory literature or all transformer expressiveness studies, so the novelty assessment reflects only the surveyed reasoning-focused papers. The theoretical contributions seem less crowded than the empirical compression and latent reasoning methods that dominate neighboring taxonomy branches.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
26
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: Reducing chain-of-thought length through latent reasoning representations. The field addresses the computational and efficiency challenges of explicit chain-of-thought (CoT) reasoning by exploring how models can perform multi-step inference in compressed or implicit forms. The taxonomy reveals several complementary directions: Latent Space Reasoning Frameworks develop architectures that operate entirely in hidden representations (e.g., Hidden CoT Decoding[5], Reasoning in Dark[41]); Compression of Explicit Chain-of-Thought focuses on distilling verbose reasoning into shorter token sequences (Compressed CoT[1], Concise CoT Distillation[49]); Hybrid Token-Latent Reasoning Architectures blend explicit and implicit steps (CODI[2], System-1.5 Reasoning[20]); Reinforcement Learning for Latent Reasoning Optimization tunes latent policies to balance efficiency and accuracy (Hybrid Latent RL[36]); Multimodal Latent Reasoning extends these ideas to vision and other modalities (Latent CoT Driving[23], Dynamic Multimodal Interleaving[15]); Theoretical Foundations and Analysis examines capacity, information flow, and expressiveness (Information Bottleneck CoT[0], Autoregressive CoT Theory[40]); and Meta-Level Reasoning Control dynamically adjusts reasoning depth (ReflCtrl[24], Adaptive Latent Reasoning[34]). Surveys and broad overviews (Latent Reasoning Survey[3], Large Reasoning Models Survey[16]) synthesize these threads, while peripheral branches capture tangentially related work. A central tension runs through many branches: how much reasoning can be internalized without sacrificing interpretability or performance. Works like Hidden Thinking[8] and Think Silently Fast[21] push toward fully opaque latent computation, while Compressed CoT[1] and Self-Training Concise Reasoning[13] retain token-level traces in condensed form. Information Bottleneck CoT[0] sits within the Theoretical Foundations and Analysis branch, specifically under Information Bottleneck and Capacity Analysis, where it formalizes the trade-off between reasoning compression and task accuracy using information-theoretic principles. This theoretical lens complements empirical compression methods (Compressed CoT[1]) and architectural innovations (CODI[2]) by providing rigorous bounds on what latent representations can achieve. Compared to neighboring theoretical work like Autoregressive CoT Theory[40] or Latency-Response Theory[38], Information Bottleneck CoT[0] emphasizes capacity constraints and optimal encoding, offering a principled framework for understanding when and why latent reasoning succeeds or fails.

Claimed Contributions

Identification of the information bottleneck in chain-of-thought reasoning

The authors identify that token-based chain-of-thought reasoning suffers from an information bottleneck where each forward pass can only write O(log |V|) bits of information (a single token) to the transcript, forcing models to use many more CoT steps than necessary despite having high-dimensional internal representations.

10 retrieved papers
Can Refute
Theoretical lower bounds for token CoT on pointer chasing and parity

The authors prove that single-layer transformers need Ω(n/d) CoT steps for pointer chasing and constant-layer finite-precision transformers need Ω(n/polylog(n)) steps for parity, establishing fundamental limitations of token-based CoT that hold regardless of model dimension or computational cost per step.

6 retrieved papers
Can Refute
Demonstration that latent CoT overcomes the information bottleneck

The authors prove that latent CoT (where models write high-dimensional embeddings instead of single tokens) reduces CoT length by roughly a factor of d for both pointer chasing and parity problems, demonstrating that the bottleneck is informational rather than computational and can be overcome by increasing write bandwidth.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Identification of the information bottleneck in chain-of-thought reasoning

The authors identify that token-based chain-of-thought reasoning suffers from an information bottleneck where each forward pass can only write O(log |V|) bits of information (a single token) to the transcript, forcing models to use many more CoT steps than necessary despite having high-dimensional internal representations.

Contribution

Theoretical lower bounds for token CoT on pointer chasing and parity

The authors prove that single-layer transformers need Ω(n/d) CoT steps for pointer chasing and constant-layer finite-precision transformers need Ω(n/polylog(n)) steps for parity, establishing fundamental limitations of token-based CoT that hold regardless of model dimension or computational cost per step.

Contribution

Demonstration that latent CoT overcomes the information bottleneck

The authors prove that latent CoT (where models write high-dimensional embeddings instead of single tokens) reduces CoT length by roughly a factor of d for both pointer chasing and parity problems, demonstrating that the bottleneck is informational rather than computational and can be overcome by increasing write bandwidth.