Transformers are Inherently Succinct

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.0 Download Report PDF

TransformerscomplexityexpressivityautomataLTL

We propose succinctness as a measure of expressive power of a transformer in describing a concept. To this end, we prove that transformers are highly expressive in that they can represent formal languages substantially more succinctly than standard representations of formal languages like finite automata and Linear Temporal Logic (LTL) formulas. As a by-product of this expressivity, verifying even simple properties of transformers is shown to be provably intractable (i.e. EXPSPACE-complete).

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes succinctness—how compactly a transformer represents a concept—as a measure of expressive power, proving that transformers can encode formal languages exponentially or doubly exponentially more compactly than finite automata or Linear Temporal Logic formulas. It resides in the 'Equivalence to Formal Language Classes' leaf, which contains six papers total, including the original work. This leaf sits within the broader 'Expressive Power and Formal Language Recognition' branch, indicating a moderately populated research direction focused on formal characterizations rather than practical applications or learning dynamics.

The taxonomy reveals neighboring leaves examining 'Recognition Capabilities for Specific Language Families' (four papers on context-free grammars and counter languages) and 'Weighted Automata and Sequential Models' (two papers on sequential reasoning). The 'Succinctness and Representational Efficiency' branch exists as a separate top-level category but contains only one paper on recurrent extensions, suggesting the original paper's focus on succinctness comparisons with classical models occupies relatively sparse territory. The scope notes clarify that equivalence studies exclude depth hierarchies and practical implementation, distinguishing this work from architectural constraint analyses in adjacent branches.

Among thirty candidates examined, the succinctness measure itself (Contribution 1) and the exponential/doubly exponential gaps (Contribution 2) each faced ten candidates with zero refutations, suggesting these representational efficiency claims are relatively novel within the limited search scope. The EXPSPACE-completeness result (Contribution 3) encountered one refutable candidate among ten examined, indicating some prior complexity analysis exists but the specific verification problem formulation may differ. The statistics reflect a focused semantic search rather than exhaustive coverage, so these findings characterize novelty relative to the most semantically similar thirty papers, not the entire field.

Based on the limited search scope and taxonomy structure, the work appears to occupy a distinctive position emphasizing representational efficiency over pure expressiveness characterizations. The sibling papers in the same leaf focus more on equivalence proofs and language class boundaries, while the succinctness angle bridges to complexity theory in a way that neighboring works do not extensively explore. However, the single refutable candidate for the complexity result suggests some overlap with prior verification analyses, warranting careful comparison in a full review.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Succinctness of transformers in representing formal languages. The field examines how efficiently transformer architectures can encode and recognize formal languages compared to classical automata. The taxonomy reveals several major branches: one focuses on expressive power and equivalence to formal language classes, asking which languages transformers can recognize and how they compare to finite automata or context-free grammars; another investigates depth and architectural constraints, exploring trade-offs between model size and representational capacity; a third addresses succinctness and representational efficiency, quantifying how compactly transformers encode languages relative to traditional models. Additional branches cover learning dynamics, pre-training strategies on formal data, automata extraction methods for interpretability, practical applications, and broader surveys. Works like Transformers Formal Languages Survey[14] and Formal Languages Transformers Survey[28] provide overarching perspectives, while studies such as Transformers Recognizers Transducers[15] and Attention Patterns Context Free[5] establish formal connections between transformer components and classical language hierarchies. A particularly active line of inquiry concerns the precise characterization of transformer expressivity: some works demonstrate equivalences to specific automata classes under various precision or attention constraints, as seen in Fixed Precision Expressivity[8] and Masked Attention Star Free[20], while others like Characterizing Transformer Expressivity[21] offer broader frameworks. The original paper, Transformers Inherently Succinct[0], sits squarely within the branch on equivalence to formal language classes but emphasizes a representational efficiency angle—arguing that transformers can encode certain languages more compactly than classical models. This contrasts with neighboring works such as Attention Patterns Context Free[5], which focuses on structural correspondence to context-free grammars, and Transformers Recognizers Transducers[15], which explores transduction capabilities. Together, these studies highlight an ongoing tension between proving what transformers can represent in principle versus understanding the resource costs of such representations, a central open question in the field.

Claimed Contributions

Succinctness as a measure of transformer expressive power

10 retrieved papers

The authors introduce succinctness—the smallest descriptional size needed to recognize a language—as an alternative measure of expressiveness for transformers. This measure captures how compactly transformers can represent concepts compared to other formalisms.

10 retrieved papers

Exponential and doubly exponential succinctness gaps

10 retrieved papers

The authors prove that transformers can represent languages exponentially more succinctly than Linear Temporal Logic and Recurrent Neural Networks, and doubly exponentially more succinctly than finite automata. This demonstrates that transformers encode complex patterns with significantly smaller descriptional sizes.

10 retrieved papers

EXPSPACE-completeness of transformer verification

Can Refute

10 retrieved papers

The authors establish that verifying simple properties about transformers, such as checking whether they recognize a trivial language, is EXPSPACE-complete. This result shows that transformer verification is computationally intractable under standard complexity-theoretic assumptions.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[8] Characterizing the Expressivity of Fixed-Precision Transformer Language Models PDF

Jiaoda Li, Ryan Cotterell (2025)

[14] Transformers as recognizers of formal languages: A survey on expressivity PDF

Lena Strobl, William Merrill, Gail Garfinkel Weiss, David Chiang, Dana Angluin (2023)

[15] Transformers as recognizers and transducers PDF

L Strobl (2025)

[20] Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages PDF

Dana Angluin, David Chiang, Andy Yang (2023)

[21] Characterizing the Expressivity of Transformer Language Models PDF

Li, Jiaoda, Cotterell, Ryan (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Succinctness as a measure of transformer expressive power

[56] Quantum long short-term memory PDF

Cannot Refute

[57] Compacting, picking and growing for unforgetting continual learning PDF

Cannot Refute

[58] Saturation in Recurrent Neural Networks: Expressivity, Learnability, and Generalization. PDF

Cannot Refute

[59] DNS-Rec: Data-aware Neural Architecture Search for Recommender Systems PDF

Cannot Refute

[60] A Lightweight Sequential Convolutional Neural Network for Smart Grid Stability Analysis PDF

Cannot Refute

[61] Towards Structured Intelligence for Sequence Modeling PDF

Cannot Refute

[62] LooperGP: A loopable sequence model for live coding performance using GuitarPro tablature PDF

Cannot Refute

[63] Consistent Bidirectional Language Modelling: Expressive Power and Representational Conciseness PDF

Cannot Refute

[64] Explainable and Efficient Knowledge Acquisition from Text PDF

Cannot Refute

[65] Low-rank passthrough neural networks PDF

Cannot Refute

Contribution

Exponential and doubly exponential succinctness gaps

[27] Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions PDF

Cannot Refute

[39] Representational strengths and limitations of transformers PDF

Cannot Refute

[48] Back to recurrent processing at the crossroad of transformers and state-space models PDF

Cannot Refute

[49] Transformers learn shortcuts to automata PDF

Cannot Refute

[50] How powerful are decoder-only transformer neural models? PDF

Cannot Refute

[51] Rethinking transformers for efficiency and scalability PDF

Cannot Refute

[52] Separations in the Representational Capabilities of Transformers and Recurrent Architectures PDF

Cannot Refute

[53] The expressive capacity of state space models: A formal language perspective PDF

Cannot Refute

[54] Rnns are not transformers (yet): The key bottleneck on in-context retrieval PDF

Cannot Refute

[55] Autoregressive+ Chain of Thought= Recurrent: Recurrence's Role in Language Models' Computability and a Revisit of Recurrent Transformer PDF

Cannot Refute

Contribution

EXPSPACE-completeness of transformer verification

[9] Transformer encoder satisfiability: Complexity and impact on formal reasoning PDF

Can Refute

[39] Representational strengths and limitations of transformers PDF

Cannot Refute

[40] An efficient and extensible zero-knowledge proof framework for neural networks PDF

Cannot Refute

[41] Error correction code transformer PDF

Cannot Refute

[42] Llm-based processor verification: A case study for neuromorphic processor PDF

Cannot Refute

[43] Complexity control facilitates reasoning-based compositional generalization in transformers PDF

Cannot Refute

[44] A unified framework for establishing the universal approximation of transformer-type architectures PDF

Cannot Refute

[45] On the turing completeness of modern neural network architectures PDF

Cannot Refute

[46] Circuit Complexity Bounds for RoPE-based Transformer Architecture PDF

Cannot Refute

[47] Identifying the limits of transformers when performing model-checking with natural language PDF

Cannot Refute

Transformers are Inherently Succinct

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[8] Characterizing the Expressivity of Fixed-Precision Transformer Language Models PDF

[14] Transformers as recognizers of formal languages: A survey on expressivity PDF

[15] Transformers as recognizers and transducers PDF

[20] Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages PDF

[21] Characterizing the Expressivity of Transformer Language Models PDF

Contribution Analysis

Succinctness as a measure of transformer expressive power

[56] Quantum long short-term memory PDF

[57] Compacting, picking and growing for unforgetting continual learning PDF

[58] Saturation in Recurrent Neural Networks: Expressivity, Learnability, and Generalization. PDF

[59] DNS-Rec: Data-aware Neural Architecture Search for Recommender Systems PDF

[60] A Lightweight Sequential Convolutional Neural Network for Smart Grid Stability Analysis PDF

[61] Towards Structured Intelligence for Sequence Modeling PDF

[62] LooperGP: A loopable sequence model for live coding performance using GuitarPro tablature PDF

[63] Consistent Bidirectional Language Modelling: Expressive Power and Representational Conciseness PDF

[64] Explainable and Efficient Knowledge Acquisition from Text PDF

[65] Low-rank passthrough neural networks PDF

Exponential and doubly exponential succinctness gaps

[27] Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions PDF

[39] Representational strengths and limitations of transformers PDF

[48] Back to recurrent processing at the crossroad of transformers and state-space models PDF

[49] Transformers learn shortcuts to automata PDF

[50] How powerful are decoder-only transformer neural models? PDF

[51] Rethinking transformers for efficiency and scalability PDF

[52] Separations in the Representational Capabilities of Transformers and Recurrent Architectures PDF

[53] The expressive capacity of state space models: A formal language perspective PDF

[54] Rnns are not transformers (yet): The key bottleneck on in-context retrieval PDF

[55] Autoregressive+ Chain of Thought= Recurrent: Recurrence's Role in Language Models' Computability and a Revisit of Recurrent Transformer PDF

EXPSPACE-completeness of transformer verification

[9] Transformer encoder satisfiability: Complexity and impact on formal reasoning PDF

[39] Representational strengths and limitations of transformers PDF

[40] An efficient and extensible zero-knowledge proof framework for neural networks PDF

[41] Error correction code transformer PDF

[42] Llm-based processor verification: A case study for neuromorphic processor PDF

[43] Complexity control facilitates reasoning-based compositional generalization in transformers PDF

[44] A unified framework for establishing the universal approximation of transformer-type architectures PDF

[45] On the turing completeness of modern neural network architectures PDF

[46] Circuit Complexity Bounds for RoPE-based Transformer Architecture PDF

[47] Identifying the limits of transformers when performing model-checking with natural language PDF

Table of Contents