Characterizing Pattern Matching and Its Limits on Compositional Task Structures

ICLR 2026 Conference SubmissionAnonymous Authors
pattern matchingcompositional generalizationfunctional equivalencecoveragepath ambiguitymechanistic interpretability
Abstract:

Despite impressive capabilities, LLMs' successes often rely on pattern-matching behaviors, yet these are also linked to OOD generalization failures in compositional tasks. However, behavioral studies commonly employ task setups that allow multiple generalization sources (e.g., algebraic invariances, structural repetition), obscuring a precise and testable account of how well LLMs perform generalization through pattern matching and their limitations. To address this ambiguity, we first formalize pattern matching as functional equivalence, i.e., identifying pairs of subsequences of inputs that consistently lead to identical results when the rest of the input is held constant. Then, we systematically study how decoder-only Transformer and Mamba behave in controlled tasks with compositional structures that isolate this mechanism. Our formalism yields predictive and quantitative insights: (1) Instance-wise success of pattern matching is well predicted by the number of contexts witnessing the relevant functional equivalence. We prove a tight sample complexity bound of learning a two-hop structure by identifying the exponent of the data scaling law for perfect in-domain generalization. Our empirical results align with the theoretical prediction, under 20× parameter scaling and across architectures. (3) Path ambiguity is a structural barrier: when a variable influences the output via multiple paths, models fail to form unified intermediate state representations, impairing accuracy and interpretability. (4) Chain-of-Thought reduces data requirements yet does not resolve path ambiguity. Hence, we provide a predictive, falsifiable boundary for pattern matching and a foundational diagnostic for disentangling mixed generalization mechanisms.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper formalizes pattern matching as functional equivalence and derives tight sample complexity bounds for two-hop compositional structures, positioning itself within the 'Formalization and Sample Complexity' leaf of the taxonomy. This leaf contains only two papers, including the original work, indicating a relatively sparse research direction focused on rigorous theoretical analysis. The paper's emphasis on provable bounds and quantitative predictions distinguishes it from the broader empirical literature on compositional generalization, which dominates other branches of the taxonomy.

The taxonomy reveals substantial activity in neighboring areas: 'Transformer and Attention Mechanisms' examines architectural inductive biases, 'Data Augmentation and Synthesis' explores training-centric solutions, and 'Systematic Generalization Patterns' characterizes empirical failure modes. The original paper's theoretical approach contrasts with these empirical and architectural threads, offering formal guarantees rather than scalable heuristics. Its focus on functional equivalence as a unifying principle bridges the gap between mechanistic understanding (explored in 'Neural and Cognitive Mechanisms') and practical generalization (studied in 'Architecture-Specific Studies'), though it remains firmly grounded in formal analysis rather than architectural design.

Among 24 candidates examined across three contributions, no refutable prior work was identified. The formalization of pattern matching as functional equivalence examined 10 candidates with no overlaps; the sample complexity bound for two-hop tasks examined 6 candidates with no refutations; and the path ambiguity analysis examined 8 candidates without finding prior coverage. This suggests that within the limited search scope—focused on top semantic matches and citation expansion—the paper's specific theoretical contributions appear distinct. However, the small candidate pool (24 total) and the sparse theoretical leaf (2 papers) mean the analysis captures a narrow slice of the literature, leaving open the possibility of related work in adjacent formal methods or complexity theory communities.

Given the limited search scope and the paper's placement in a sparsely populated theoretical leaf, the contributions appear novel within the examined context. The formalization and sample complexity results address a gap between empirical studies of compositional generalization and rigorous theoretical characterization. However, the analysis does not cover exhaustive searches in formal learning theory or adjacent mathematical frameworks, so the assessment reflects novelty within the specific compositional generalization literature surveyed rather than across all relevant theoretical domains.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
24
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: pattern matching in compositional generalization tasks. The field examines how models learn to combine familiar components in novel ways, a challenge that spans semantic parsing, vision-language reasoning, and algorithmic tasks. The taxonomy reflects a multifaceted landscape: Theoretical Foundations and Mechanisms explore formal guarantees and sample complexity bounds, as in Pattern Matching Limits[0] and Coverage Principle[35]; Architecture-Specific Studies investigate how transformers, graph networks, and specialized modules handle compositional structure, with works like RASP COGS Transformers[50] and Multimodal Graph Networks[8] probing inductive biases; Training Methods and Data Strategies address curriculum design and augmentation techniques, exemplified by Sampling Diverse Training[30] and Latent Structure Augmentation[18]; Representation Learning and Alignment focus on aligning latent spaces and grounding symbols, seen in Latent Surface Alignment[11] and Grounded Language Models[27]; Domain-Specific Applications tailor compositional reasoning to text-to-SQL, code search, and vision tasks, including Compositional Text-to-SQL[17] and Compositional Code Search[13]; Empirical Analysis and Benchmarking rigorously evaluate generalization gaps on datasets like COGS and ARC, as in Noise Impact ARC[21]; and Specialized Compositional Structures study copatterns, hierarchical decompositions, and other structured representations, with CoScheme Copatterns[5] and Compositional Copatterns[32]. Several active lines of work highlight contrasting emphases and open questions. One thread pursues provable guarantees on when pattern matching suffices for compositional tasks, balancing expressiveness with learnability; another investigates architectural interventions—such as attention supervision or modular designs—that encourage systematic recombination without explicit symbolic scaffolding. A third direction explores data-centric strategies, asking whether diverse sampling or augmentation can substitute for architectural inductive biases. Pattern Matching Limits[0] sits squarely within the theoretical branch, formalizing sample complexity and expressiveness boundaries for pattern-based learners. Its emphasis on rigorous bounds complements empirical studies like Neural Compositional Generalization[1] and contrasts with data-driven approaches such as Data Compositional Generalization[3], which prioritize scalable training recipes over worst-case analysis. Together, these perspectives frame ongoing debates about whether compositional generalization demands principled architectural priors, richer training regimes, or fundamentally new learning paradigms.

Claimed Contributions

Formalization of pattern matching as functional equivalence

The authors introduce a model-agnostic, data-centric definition of pattern matching by formalizing it as functional equivalence. This framework defines when two input subsequences are functionally equivalent based on consistent behavior in shared contexts, and introduces the concept of k-coverage to predict which test inputs can be reliably handled through pattern matching.

10 retrieved papers
Tight sample complexity bound for two-hop compositional tasks

The authors establish and prove a theoretical bound (Theorem 6.1) showing that the training dataset size required for perfect in-domain generalization on two-hop tasks scales polynomially with token set size, with exponent c in [2, 2.5). This bound is empirically verified across different model sizes and architectures.

6 retrieved papers
Identification of path ambiguity as a structural barrier

The authors identify and characterize path ambiguity as a failure mode where variables affecting outputs through multiple computational paths prevent models from forming unified intermediate representations. This structural problem persists even with Chain-of-Thought supervision and impairs both generalization performance and model interpretability.

8 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Formalization of pattern matching as functional equivalence

The authors introduce a model-agnostic, data-centric definition of pattern matching by formalizing it as functional equivalence. This framework defines when two input subsequences are functionally equivalent based on consistent behavior in shared contexts, and introduces the concept of k-coverage to predict which test inputs can be reliably handled through pattern matching.

Contribution

Tight sample complexity bound for two-hop compositional tasks

The authors establish and prove a theoretical bound (Theorem 6.1) showing that the training dataset size required for perfect in-domain generalization on two-hop tasks scales polynomially with token set size, with exponent c in [2, 2.5). This bound is empirically verified across different model sizes and architectures.

Contribution

Identification of path ambiguity as a structural barrier

The authors identify and characterize path ambiguity as a failure mode where variables affecting outputs through multiple computational paths prevent models from forming unified intermediate representations. This structural problem persists even with Chain-of-Thought supervision and impairs both generalization performance and model interpretability.