Transformers are Inherently Succinct

ICLR 2026 Conference SubmissionAnonymous Authors
TransformerscomplexityexpressivityautomataLTL
Abstract:

We propose succinctness as a measure of expressive power of a transformer in describing a concept. To this end, we prove that transformers are highly expressive in that they can represent formal languages substantially more succinctly than standard representations of formal languages like finite automata and Linear Temporal Logic (LTL) formulas. As a by-product of this expressivity, verifying even simple properties of transformers is shown to be provably intractable (i.e. EXPSPACE-complete).

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes succinctness—how compactly a transformer represents a concept—as a measure of expressive power, proving that transformers can encode formal languages exponentially or doubly exponentially more compactly than finite automata or Linear Temporal Logic formulas. It resides in the 'Equivalence to Formal Language Classes' leaf, which contains six papers total, including the original work. This leaf sits within the broader 'Expressive Power and Formal Language Recognition' branch, indicating a moderately populated research direction focused on formal characterizations rather than practical applications or learning dynamics.

The taxonomy reveals neighboring leaves examining 'Recognition Capabilities for Specific Language Families' (four papers on context-free grammars and counter languages) and 'Weighted Automata and Sequential Models' (two papers on sequential reasoning). The 'Succinctness and Representational Efficiency' branch exists as a separate top-level category but contains only one paper on recurrent extensions, suggesting the original paper's focus on succinctness comparisons with classical models occupies relatively sparse territory. The scope notes clarify that equivalence studies exclude depth hierarchies and practical implementation, distinguishing this work from architectural constraint analyses in adjacent branches.

Among thirty candidates examined, the succinctness measure itself (Contribution 1) and the exponential/doubly exponential gaps (Contribution 2) each faced ten candidates with zero refutations, suggesting these representational efficiency claims are relatively novel within the limited search scope. The EXPSPACE-completeness result (Contribution 3) encountered one refutable candidate among ten examined, indicating some prior complexity analysis exists but the specific verification problem formulation may differ. The statistics reflect a focused semantic search rather than exhaustive coverage, so these findings characterize novelty relative to the most semantically similar thirty papers, not the entire field.

Based on the limited search scope and taxonomy structure, the work appears to occupy a distinctive position emphasizing representational efficiency over pure expressiveness characterizations. The sibling papers in the same leaf focus more on equivalence proofs and language class boundaries, while the succinctness angle bridges to complexity theory in a way that neighboring works do not extensively explore. However, the single refutable candidate for the complexity result suggests some overlap with prior verification analyses, warranting careful comparison in a full review.

Taxonomy

Core-task Taxonomy Papers
38
3
Claimed Contributions
30
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: Succinctness of transformers in representing formal languages. The field examines how efficiently transformer architectures can encode and recognize formal languages compared to classical automata. The taxonomy reveals several major branches: one focuses on expressive power and equivalence to formal language classes, asking which languages transformers can recognize and how they compare to finite automata or context-free grammars; another investigates depth and architectural constraints, exploring trade-offs between model size and representational capacity; a third addresses succinctness and representational efficiency, quantifying how compactly transformers encode languages relative to traditional models. Additional branches cover learning dynamics, pre-training strategies on formal data, automata extraction methods for interpretability, practical applications, and broader surveys. Works like Transformers Formal Languages Survey[14] and Formal Languages Transformers Survey[28] provide overarching perspectives, while studies such as Transformers Recognizers Transducers[15] and Attention Patterns Context Free[5] establish formal connections between transformer components and classical language hierarchies. A particularly active line of inquiry concerns the precise characterization of transformer expressivity: some works demonstrate equivalences to specific automata classes under various precision or attention constraints, as seen in Fixed Precision Expressivity[8] and Masked Attention Star Free[20], while others like Characterizing Transformer Expressivity[21] offer broader frameworks. The original paper, Transformers Inherently Succinct[0], sits squarely within the branch on equivalence to formal language classes but emphasizes a representational efficiency angle—arguing that transformers can encode certain languages more compactly than classical models. This contrasts with neighboring works such as Attention Patterns Context Free[5], which focuses on structural correspondence to context-free grammars, and Transformers Recognizers Transducers[15], which explores transduction capabilities. Together, these studies highlight an ongoing tension between proving what transformers can represent in principle versus understanding the resource costs of such representations, a central open question in the field.

Claimed Contributions

Succinctness as a measure of transformer expressive power

The authors introduce succinctness—the smallest descriptional size needed to recognize a language—as an alternative measure of expressiveness for transformers. This measure captures how compactly transformers can represent concepts compared to other formalisms.

10 retrieved papers
Exponential and doubly exponential succinctness gaps

The authors prove that transformers can represent languages exponentially more succinctly than Linear Temporal Logic and Recurrent Neural Networks, and doubly exponentially more succinctly than finite automata. This demonstrates that transformers encode complex patterns with significantly smaller descriptional sizes.

10 retrieved papers
EXPSPACE-completeness of transformer verification

The authors establish that verifying simple properties about transformers, such as checking whether they recognize a trivial language, is EXPSPACE-complete. This result shows that transformer verification is computationally intractable under standard complexity-theoretic assumptions.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Succinctness as a measure of transformer expressive power

The authors introduce succinctness—the smallest descriptional size needed to recognize a language—as an alternative measure of expressiveness for transformers. This measure captures how compactly transformers can represent concepts compared to other formalisms.

Contribution

Exponential and doubly exponential succinctness gaps

The authors prove that transformers can represent languages exponentially more succinctly than Linear Temporal Logic and Recurrent Neural Networks, and doubly exponentially more succinctly than finite automata. This demonstrates that transformers encode complex patterns with significantly smaller descriptional sizes.

Contribution

EXPSPACE-completeness of transformer verification

The authors establish that verifying simple properties about transformers, such as checking whether they recognize a trivial language, is EXPSPACE-complete. This result shows that transformer verification is computationally intractable under standard complexity-theoretic assumptions.