The Counting Power of Transformers

ICLR 2026 Conference SubmissionAnonymous Authors
FLaNNexpressivenessattentionformal languages
Abstract:

Counting properties (e.g. determining whether certain tokens occur more than other tokens in a given input text) have played a significant role in the study of expressiveness of transformers. In this paper, we provide a formal framework for investigating the counting power of transformers. We argue that all existing results demonstrate transformers' expressivity only for (semi-)linear counting properties, i.e., which are expressible as a boolean combination of linear inequalities. Our main result is that transformers can express counting properties that are highly nonlinear. More precisely, we prove that transformers can capture all semialgebraic counting properties, i.e., expressible as a boolean combination of arbitrary multivariate polynomials (of any degree). Among others, these generalize the counting properties that can be captured by support vector machines via polynomial kernel in the vector space model. To complement this result, we exhibit a natural subclass of (softmax) transformers that completely characterizes semialgebraic counting properties. Through connections with the Hilbert's tenth problem, this expressivity of transformers also yields a new undecidability result for analyzing an extremely simple transformer model --- surprisingly with neither positional encodings (i.e. NoPE-transformers) nor masking. We also experimentally validate trainability of such counting properties.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper establishes that transformers can express semialgebraic counting properties—those defined by boolean combinations of arbitrary multivariate polynomials—and provides a formal framework for analyzing counting power. It resides in the Logic-Based Expressiveness Characterization leaf, which contains five papers total, including the original work. This leaf sits within the broader Theoretical Expressiveness and Formal Characterization branch, indicating a moderately populated research direction focused on formal proofs rather than empirical validation. The sibling papers similarly use logical or formal language frameworks to characterize transformer expressiveness, suggesting this is an active but not overcrowded niche.

The taxonomy reveals that theoretical expressiveness research divides into logic-based characterizations, complexity-theoretic bounds, and positional encoding analyses. The original paper's leaf neighbors the Complexity-Theoretic Bounds leaf (four papers) and the Positional Encoding leaf (three papers), both addressing complementary aspects of transformer power. The taxonomy's scope_note clarifies that logic-based work establishes equivalences with formal systems, while complexity studies prove computational limits—this paper bridges both by using logical frameworks to demonstrate nonlinear counting capacity. The broader Empirical Analysis branch (seven papers across three leaves) explores whether these theoretical capabilities manifest in practice, highlighting a field-wide gap between formal expressiveness and learned behavior.

Among twenty candidates examined, Contribution A (formal framework) showed no clear refutation across ten candidates, suggesting novelty in how the framework is structured. Contribution B (semialgebraic properties) examined four candidates and found one refutable match, indicating some prior work addresses nonlinear counting expressiveness, though the scale of overlap remains unclear from this limited search. Contribution C (uniform average hard attention characterization) examined six candidates with one refutable match, suggesting partial overlap in architectural characterization approaches. The analysis explicitly covers top-K semantic matches plus citation expansion, not an exhaustive survey of all transformer expressiveness literature.

Given the limited search scope of twenty candidates, the paper appears to advance the logic-based expressiveness agenda by moving beyond semilinear properties to semialgebraic ones, though at least two contributions show some prior work overlap. The taxonomy context suggests this work extends an established research direction rather than opening an entirely new area. A broader literature search might reveal additional overlapping results, particularly in the complexity-theoretic or empirical branches not fully covered here.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
20
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: expressiveness of transformers for counting properties. The field divides into several main branches that reflect different angles on how well transformers can capture counting and related quantitative reasoning. Theoretical Expressiveness and Formal Characterization investigates the fundamental computational limits of transformer architectures through logic-based frameworks and complexity-theoretic analyses, asking which counting tasks are representable in principle and what architectural features enable or constrain this capacity. Works such as Logical Languages[16] and Tighter Bounds[23] exemplify efforts to map transformer expressiveness onto formal language hierarchies and circuit complexity classes. Empirical Analysis and Training Dynamics explores how transformers learn counting in practice, examining phenomena like length generalization, the role of positional encodings, and inductive biases that emerge during training—illustrated by studies like Counting Like Transformers[5] and Inductive Biases Count[8]. Practical Counting Applications encompasses a dense branch of vision-based crowd counting, object enumeration, and domain-specific tasks (e.g., cell counting, tree counting from aerial imagery), where transformer-based architectures are deployed and refined for real-world performance. A smaller set of works falls outside these core themes, addressing tangential or unrelated topics. Within the theoretical branch, a particularly active line of inquiry uses logical languages to characterize what transformers can and cannot express, contrasting the power of attention mechanisms with classical automata and first-order logic extensions. Counting Power[0] sits squarely in this logic-based expressiveness cluster, closely aligned with Logical Languages[16] and Graph Logic[49], which similarly formalize counting capabilities through logical frameworks. Compared to Tighter Bounds[23], which emphasizes circuit-theoretic lower bounds, Counting Power[0] focuses more directly on the interplay between transformer depth, attention structure, and the ability to solve counting problems defined in logical terms. Meanwhile, empirical studies like When Count[1] and Teaching Arithmetic[18] probe whether trained models exhibit the counting behaviors predicted by theory, highlighting an ongoing tension between what transformers can represent in principle and what they reliably learn from data. This landscape reveals open questions about bridging formal guarantees with practical learnability and scaling.

Claimed Contributions

Formal framework for counting power of transformers

The authors introduce a formal framework to study the expressiveness of transformers for counting properties, which are permutation-closed languages where acceptance depends only on the number of occurrences of tokens rather than their order.

10 retrieved papers
Transformers capture semialgebraic counting properties

The main theoretical result demonstrates that transformers can express counting properties defined by boolean combinations of polynomial inequalities of arbitrary degree, generalizing beyond the linear counting properties captured by existing models like C-RASP.

4 retrieved papers
Can Refute
Characterization via uniform average hard attention transformers

The authors prove that NoPE-AHAT and NoPE-AHAT[U] (transformers without positional encodings using average hard attention with uniform layers) precisely capture semialgebraic counting properties, providing a complete characterization of this expressiveness class.

6 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Formal framework for counting power of transformers

The authors introduce a formal framework to study the expressiveness of transformers for counting properties, which are permutation-closed languages where acceptance depends only on the number of occurrences of tokens rather than their order.

Contribution

Transformers capture semialgebraic counting properties

The main theoretical result demonstrates that transformers can express counting properties defined by boolean combinations of polynomial inequalities of arbitrary degree, generalizing beyond the linear counting properties captured by existing models like C-RASP.

Contribution

Characterization via uniform average hard attention transformers

The authors prove that NoPE-AHAT and NoPE-AHAT[U] (transformers without positional encodings using average hard attention with uniform layers) precisely capture semialgebraic counting properties, providing a complete characterization of this expressiveness class.

The Counting Power of Transformers | Novelty Validation