The Counting Power of Transformers

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.4 Download Report PDF

FLaNNexpressivenessattentionformal languages

Counting properties (e.g. determining whether certain tokens occur more than other tokens in a given input text) have played a significant role in the study of expressiveness of transformers. In this paper, we provide a formal framework for investigating the counting power of transformers. We argue that all existing results demonstrate transformers' expressivity only for (semi-)linear counting properties, i.e., which are expressible as a boolean combination of linear inequalities. Our main result is that transformers can express counting properties that are highly nonlinear. More precisely, we prove that transformers can capture all semialgebraic counting properties, i.e., expressible as a boolean combination of arbitrary multivariate polynomials (of any degree). Among others, these generalize the counting properties that can be captured by support vector machines via polynomial kernel in the vector space model. To complement this result, we exhibit a natural subclass of (softmax) transformers that completely characterizes semialgebraic counting properties. Through connections with the Hilbert's tenth problem, this expressivity of transformers also yields a new undecidability result for analyzing an extremely simple transformer model --- surprisingly with neither positional encodings (i.e. NoPE-transformers) nor masking. We also experimentally validate trainability of such counting properties.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper establishes that transformers can express semialgebraic counting properties—those defined by boolean combinations of arbitrary multivariate polynomials—and provides a formal framework for analyzing counting power. It resides in the Logic-Based Expressiveness Characterization leaf, which contains five papers total, including the original work. This leaf sits within the broader Theoretical Expressiveness and Formal Characterization branch, indicating a moderately populated research direction focused on formal proofs rather than empirical validation. The sibling papers similarly use logical or formal language frameworks to characterize transformer expressiveness, suggesting this is an active but not overcrowded niche.

The taxonomy reveals that theoretical expressiveness research divides into logic-based characterizations, complexity-theoretic bounds, and positional encoding analyses. The original paper's leaf neighbors the Complexity-Theoretic Bounds leaf (four papers) and the Positional Encoding leaf (three papers), both addressing complementary aspects of transformer power. The taxonomy's scope_note clarifies that logic-based work establishes equivalences with formal systems, while complexity studies prove computational limits—this paper bridges both by using logical frameworks to demonstrate nonlinear counting capacity. The broader Empirical Analysis branch (seven papers across three leaves) explores whether these theoretical capabilities manifest in practice, highlighting a field-wide gap between formal expressiveness and learned behavior.

Among twenty candidates examined, Contribution A (formal framework) showed no clear refutation across ten candidates, suggesting novelty in how the framework is structured. Contribution B (semialgebraic properties) examined four candidates and found one refutable match, indicating some prior work addresses nonlinear counting expressiveness, though the scale of overlap remains unclear from this limited search. Contribution C (uniform average hard attention characterization) examined six candidates with one refutable match, suggesting partial overlap in architectural characterization approaches. The analysis explicitly covers top-K semantic matches plus citation expansion, not an exhaustive survey of all transformer expressiveness literature.

Given the limited search scope of twenty candidates, the paper appears to advance the logic-based expressiveness agenda by moving beyond semilinear properties to semialgebraic ones, though at least two contributions show some prior work overlap. The taxonomy context suggests this work extends an established research direction rather than opening an entirely new area. A broader literature search might reveal additional overlapping results, particularly in the complexity-theoretic or empirical branches not fully covered here.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: expressiveness of transformers for counting properties. The field divides into several main branches that reflect different angles on how well transformers can capture counting and related quantitative reasoning. Theoretical Expressiveness and Formal Characterization investigates the fundamental computational limits of transformer architectures through logic-based frameworks and complexity-theoretic analyses, asking which counting tasks are representable in principle and what architectural features enable or constrain this capacity. Works such as Logical Languages[16] and Tighter Bounds[23] exemplify efforts to map transformer expressiveness onto formal language hierarchies and circuit complexity classes. Empirical Analysis and Training Dynamics explores how transformers learn counting in practice, examining phenomena like length generalization, the role of positional encodings, and inductive biases that emerge during training—illustrated by studies like Counting Like Transformers[5] and Inductive Biases Count[8]. Practical Counting Applications encompasses a dense branch of vision-based crowd counting, object enumeration, and domain-specific tasks (e.g., cell counting, tree counting from aerial imagery), where transformer-based architectures are deployed and refined for real-world performance. A smaller set of works falls outside these core themes, addressing tangential or unrelated topics. Within the theoretical branch, a particularly active line of inquiry uses logical languages to characterize what transformers can and cannot express, contrasting the power of attention mechanisms with classical automata and first-order logic extensions. Counting Power[0] sits squarely in this logic-based expressiveness cluster, closely aligned with Logical Languages[16] and Graph Logic[49], which similarly formalize counting capabilities through logical frameworks. Compared to Tighter Bounds[23], which emphasizes circuit-theoretic lower bounds, Counting Power[0] focuses more directly on the interplay between transformer depth, attention structure, and the ability to solve counting problems defined in logical terms. Meanwhile, empirical studies like When Count[1] and Teaching Arithmetic[18] probe whether trained models exhibit the counting behaviors predicted by theory, highlighting an ongoing tension between what transformers can represent in principle and what they reliably learn from data. This landscape reveals open questions about bridging formal guarantees with practical learnability and scaling.

Claimed Contributions

Formal framework for counting power of transformers

10 retrieved papers

The authors introduce a formal framework to study the expressiveness of transformers for counting properties, which are permutation-closed languages where acceptance depends only on the number of occurrences of tokens rather than their order.

10 retrieved papers

Transformers capture semialgebraic counting properties

Can Refute

4 retrieved papers

The main theoretical result demonstrates that transformers can express counting properties defined by boolean combinations of polynomial inequalities of arbitrary degree, generalizing beyond the linear counting properties captured by existing models like C-RASP.

4 retrieved papers

Can Refute

Characterization via uniform average hard attention transformers

Can Refute

6 retrieved papers

The authors prove that NoPE-AHAT and NoPE-AHAT[U] (transformers without positional encodings using average hard attention with uniform layers) precisely capture semialgebraic counting properties, providing a complete characterization of this expressiveness class.

6 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[5] Counting like transformers: Compiling temporal counting logic into softmax transformers PDF

Chiang, David, Andy Yang, David Chiang (2024)

[16] Logical languages accepted by transformer encoders with hard attention PDF

Barcelo, Pablo, Pablo BarcelÃ³, Kozachinskiy, Alexander, Alexander Kozachinskiy, Lin, Anthony Widjaja, Anthony W. Lin, Podolskii, Vladimir, Vladimir V. Podolskii, A. W. Lin, Vladimir Podolskii (2023)

[23] Tighter bounds on the expressivity of transformer encoders PDF

Chiang, David, Cholak, Peter, Pillay, Anand, Peter A. Cholak, A. Pillay (2023)

[49] Expressive Power of Graph Transformers via Logic PDF

Ahvonen, Veeti, Funk, Maurice, Heiman, Damian, Kuusisto, Antti, Lutz, Carsten (2025) • arXiv.org

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Formal framework for counting power of transformers

[1] When Can Transformers Count to n? PDF

Cannot Refute

[2] Nope: the counting power of transformers with no positional encodings PDF

Cannot Refute

[5] Counting like transformers: Compiling temporal counting logic into softmax transformers PDF

Cannot Refute

[8] Language models need inductive biases to count inductively PDF

Cannot Refute

[13] Contextual Counting: A Mechanistic Study of Transformers on a Quantitative Task PDF

Cannot Refute

[23] Tighter bounds on the expressivity of transformer encoders PDF

Cannot Refute

[54] Dynamic unary convolution in transformers PDF

Cannot Refute

[55] Homomorphism Counts as Structural Encodings for Graph Learning PDF

Cannot Refute

[56] Trees in transformers: a theoretical analysis of the Transformer's ability to represent trees PDF

Cannot Refute

[57] Counting ability of large language models and impact of tokenization PDF

Cannot Refute

Contribution

Transformers capture semialgebraic counting properties

[2] Nope: the counting power of transformers with no positional encodings PDF

Can Refute

[58] Transformers in Uniform TC PDF

Cannot Refute

[59] BOLD: Boolean Logic Deep Learning PDF

Cannot Refute

[60] Solving Mathematical Problems with Transformers PDF

Cannot Refute

Contribution

Characterization via uniform average hard attention transformers

[2] Nope: the counting power of transformers with no positional encodings PDF

Can Refute

[16] Logical languages accepted by transformer encoders with hard attention PDF

Cannot Refute

[49] Expressive Power of Graph Transformers via Logic PDF

Cannot Refute

[51] Transformers as transducers PDF

Cannot Refute

[52] SmartPod: An Automated Framework for High-Precision Soybean Pod Counting in Field Phenotyping PDF

Cannot Refute

[53] ESSLLI 2024 Course Notes Expressivity of Transformers: Logic, Circuits, and Formal Languages PDF

Cannot Refute

The Counting Power of Transformers

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[5] Counting like transformers: Compiling temporal counting logic into softmax transformers PDF

[16] Logical languages accepted by transformer encoders with hard attention PDF

[23] Tighter bounds on the expressivity of transformer encoders PDF

[49] Expressive Power of Graph Transformers via Logic PDF

Contribution Analysis

Formal framework for counting power of transformers

[1] When Can Transformers Count to n? PDF

[2] Nope: the counting power of transformers with no positional encodings PDF

[5] Counting like transformers: Compiling temporal counting logic into softmax transformers PDF

[8] Language models need inductive biases to count inductively PDF

[13] Contextual Counting: A Mechanistic Study of Transformers on a Quantitative Task PDF

[23] Tighter bounds on the expressivity of transformer encoders PDF

[54] Dynamic unary convolution in transformers PDF

[55] Homomorphism Counts as Structural Encodings for Graph Learning PDF

[56] Trees in transformers: a theoretical analysis of the Transformer's ability to represent trees PDF

[57] Counting ability of large language models and impact of tokenization PDF

Transformers capture semialgebraic counting properties

[2] Nope: the counting power of transformers with no positional encodings PDF

[58] Transformers in Uniform TC PDF

[59] BOLD: Boolean Logic Deep Learning PDF

[60] Solving Mathematical Problems with Transformers PDF

Characterization via uniform average hard attention transformers

[2] Nope: the counting power of transformers with no positional encodings PDF

[16] Logical languages accepted by transformer encoders with hard attention PDF

[49] Expressive Power of Graph Transformers via Logic PDF

[51] Transformers as transducers PDF

[52] SmartPod: An Automated Framework for High-Precision Soybean Pod Counting in Field Phenotyping PDF

[53] ESSLLI 2024 Course Notes Expressivity of Transformers: Logic, Circuits, and Formal Languages PDF

Table of Contents