Trapped by simplicity: When Transformers fail to learn from noisy features

ICLR 2026 Conference SubmissionAnonymous Authors
boolean analysissimplicity biastransformerfeature noise
Abstract:

Noise is ubiquitous in data used to train large language models, but it is not well understood whether these models are able to correctly generalize to inputs generated without noise. Here, we study noise-robust learning: are transformers trained on data with noisy features able to find a target function that correctly predicts labels for noiseless features? We show that transformers succeed at noise-robust learning for a selection of kk-sparse parity and majority functions, compared to LSTMs which fail at this task for even modest feature noise. However, we find that transformers typically fail at noise-robust learning of random kk-juntas, especially when the boolean sensitivity of the optimal solution is smaller than that of the target function. We argue that this failure is due to a combination of two factors: transformers' bias toward simpler functions, combined with an observation that the empirically optimal function for noise-robust learning has lower sensitivity than the target function. We test this hypothesis by exploiting transformers' simplicity bias to trap them in an incorrect solution, but show that transformers can escape this trap by training with an additional loss term penalizing high-sensitivity solutions. Overall, we find that transformers are particularly ineffective for learning boolean functions in the presence of feature noise.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper investigates whether transformers trained on noisy Boolean features can generalize to clean inputs, focusing on k-sparse parity, majority, and random k-junta functions. It resides in the 'Transformer Simplicity Bias and Noise Robustness' leaf, which contains only two papers total. This represents a sparse, emerging research direction within the broader 'Neural Network Learning of Boolean Functions' branch. The limited population of this leaf suggests the specific intersection of transformer architectures, simplicity bias, and noise-robust Boolean learning remains relatively unexplored compared to adjacent areas like symbolic learning or theoretical complexity analysis.

The taxonomy reveals neighboring work in symbolic regression and sparse polynomial learning, which pursue interpretable Boolean formulas through non-neural methods, and explainable neural approaches like Boolformer that learn interpretable DNFs. The paper diverges from these by examining inductive biases rather than interpretability mechanisms. It also connects to theoretical foundations studying noise sensitivity and approximation properties of Boolean functions, though those works analyze intrinsic function characteristics rather than neural network learning dynamics. The scope note for this leaf explicitly excludes symbolic regression, positioning the work as distinctly neural-centric within the classical learning paradigm.

Among the three contributions analyzed, the empirical demonstration examined two candidates with one appearing to provide overlapping prior work, while the simplicity bias explanation examined ten candidates with one potential refutation. The sensitivity penalty intervention examined ten candidates with none clearly refuting it. These statistics reflect a limited search scope of twenty-two total candidates, not an exhaustive literature review. The first two contributions show some prior overlap within this constrained sample, suggesting related empirical observations or theoretical frameworks exist, while the third contribution appears more distinctive among the examined papers.

Based on the limited search covering top semantic matches, the work appears to occupy a sparsely populated research direction with some conceptual overlap in explaining transformer behavior on Boolean tasks. The analysis does not cover the full landscape of transformer learning theory or Boolean function complexity, focusing instead on papers semantically proximate to noise robustness and simplicity bias. The taxonomy structure indicates this specific intersection remains less crowded than adjacent areas like quantum learning or Boolean network control.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
22
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: noise-robust learning of boolean functions from noisy features. The field encompasses diverse approaches to understanding and learning Boolean functions under various forms of noise and perturbation. At the highest level, the taxonomy reveals five major branches: quantum and quantum-inspired methods that leverage quantum computing principles for learning tasks; classical learning paradigms grounded in statistical query complexity, neural network architectures, and traditional machine learning theory; Boolean network dynamics focusing on control, stability, and synchronization under perturbations; theoretical foundations examining intrinsic properties like sensitivity, robustness measures, and structural characteristics of Boolean functions; and application-driven work deploying these ideas in domains from hardware design to biological network modeling. Works like Quantum Boolean Learning[1] and Quantum Learning Robust[6] illustrate the quantum branch, while classical approaches range from hardness results such as Statistical Query Lower Bounds[2] to neural methods like Boolformer[23] and Differentiable Matricized DNFs[17]. Several active lines of research reveal key trade-offs and open questions. One prominent theme contrasts theoretical hardness results—showing fundamental limits on learning certain function classes in noisy settings—with practical neural network methods that empirically succeed on structured Boolean tasks. Another tension appears between control-theoretic perspectives on Boolean network stability (e.g., Robust Cluster Synchronization[3], Robust Flipping Stabilization[5]) and learning-theoretic views emphasizing generalization from noisy samples. Trapped by Simplicity[0] sits within the neural network learning cluster, closely aligned with Simplicity Bias Transformers[14], both examining how transformer architectures exhibit inductive biases toward simpler Boolean functions. While Simplicity Bias Transformers[14] characterizes this bias more generally, Trapped by Simplicity[0] emphasizes the implications for noise robustness, exploring how simplicity preferences interact with feature corruption—a question that bridges the classical learning and theoretical foundations branches.

Claimed Contributions

Empirical demonstration of transformers' mixed success at noise-robust learning

The authors empirically demonstrate that transformers can learn sparse parity and odd-length majority functions from noisy training data, outperforming LSTMs. However, transformers fail to learn random k-juntas robustly despite achieving near-optimal validation accuracy on noisy data.

2 retrieved papers
Can Refute
Explanation linking simplicity bias to noise-robust learning failure

The authors propose that transformers fail at noise-robust learning because their simplicity bias leads them to prefer low-sensitivity solutions, while the optimal predictor for noisy data typically has lower sensitivity than the target function for random boolean functions.

10 retrieved papers
Can Refute
Demonstration of trapping transformers and escape via sensitivity penalty

The authors design a controlled experiment showing that transformers can be trapped into learning a simpler incorrect function when it achieves similar noisy validation accuracy as the target. They demonstrate that adding a sensitivity penalty to the loss function can enable transformers to escape this trap under certain conditions.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Empirical demonstration of transformers' mixed success at noise-robust learning

The authors empirically demonstrate that transformers can learn sparse parity and odd-length majority functions from noisy training data, outperforming LSTMs. However, transformers fail to learn random k-juntas robustly despite achieving near-optimal validation accuracy on noisy data.

Contribution

Explanation linking simplicity bias to noise-robust learning failure

The authors propose that transformers fail at noise-robust learning because their simplicity bias leads them to prefer low-sensitivity solutions, while the optimal predictor for noisy data typically has lower sensitivity than the target function for random boolean functions.

Contribution

Demonstration of trapping transformers and escape via sensitivity penalty

The authors design a controlled experiment showing that transformers can be trapped into learning a simpler incorrect function when it achieves similar noisy validation accuracy as the target. They demonstrate that adding a sensitivity penalty to the loss function can enable transformers to escape this trap under certain conditions.