Trapped by simplicity: When Transformers fail to learn from noisy features

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

boolean analysissimplicity biastransformerfeature noise

Noise is ubiquitous in data used to train large language models, but it is not well understood whether these models are able to correctly generalize to inputs generated without noise. Here, we study noise-robust learning: are transformers trained on data with noisy features able to find a target function that correctly predicts labels for noiseless features? We show that transformers succeed at noise-robust learning for a selection of $k$ -sparse parity and majority functions, compared to LSTMs which fail at this task for even modest feature noise. However, we find that transformers typically fail at noise-robust learning of random $k$ -juntas, especially when the boolean sensitivity of the optimal solution is smaller than that of the target function. We argue that this failure is due to a combination of two factors: transformers' bias toward simpler functions, combined with an observation that the empirically optimal function for noise-robust learning has lower sensitivity than the target function. We test this hypothesis by exploiting transformers' simplicity bias to trap them in an incorrect solution, but show that transformers can escape this trap by training with an additional loss term penalizing high-sensitivity solutions. Overall, we find that transformers are particularly ineffective for learning boolean functions in the presence of feature noise.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper investigates whether transformers trained on noisy Boolean features can generalize to clean inputs, focusing on k-sparse parity, majority, and random k-junta functions. It resides in the 'Transformer Simplicity Bias and Noise Robustness' leaf, which contains only two papers total. This represents a sparse, emerging research direction within the broader 'Neural Network Learning of Boolean Functions' branch. The limited population of this leaf suggests the specific intersection of transformer architectures, simplicity bias, and noise-robust Boolean learning remains relatively unexplored compared to adjacent areas like symbolic learning or theoretical complexity analysis.

The taxonomy reveals neighboring work in symbolic regression and sparse polynomial learning, which pursue interpretable Boolean formulas through non-neural methods, and explainable neural approaches like Boolformer that learn interpretable DNFs. The paper diverges from these by examining inductive biases rather than interpretability mechanisms. It also connects to theoretical foundations studying noise sensitivity and approximation properties of Boolean functions, though those works analyze intrinsic function characteristics rather than neural network learning dynamics. The scope note for this leaf explicitly excludes symbolic regression, positioning the work as distinctly neural-centric within the classical learning paradigm.

Among the three contributions analyzed, the empirical demonstration examined two candidates with one appearing to provide overlapping prior work, while the simplicity bias explanation examined ten candidates with one potential refutation. The sensitivity penalty intervention examined ten candidates with none clearly refuting it. These statistics reflect a limited search scope of twenty-two total candidates, not an exhaustive literature review. The first two contributions show some prior overlap within this constrained sample, suggesting related empirical observations or theoretical frameworks exist, while the third contribution appears more distinctive among the examined papers.

Based on the limited search covering top semantic matches, the work appears to occupy a sparsely populated research direction with some conceptual overlap in explaining transformer behavior on Boolean tasks. The analysis does not cover the full landscape of transformer learning theory or Boolean function complexity, focusing instead on papers semantically proximate to noise robustness and simplicity bias. The taxonomy structure indicates this specific intersection remains less crowded than adjacent areas like quantum learning or Boolean network control.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: noise-robust learning of boolean functions from noisy features. The field encompasses diverse approaches to understanding and learning Boolean functions under various forms of noise and perturbation. At the highest level, the taxonomy reveals five major branches: quantum and quantum-inspired methods that leverage quantum computing principles for learning tasks; classical learning paradigms grounded in statistical query complexity, neural network architectures, and traditional machine learning theory; Boolean network dynamics focusing on control, stability, and synchronization under perturbations; theoretical foundations examining intrinsic properties like sensitivity, robustness measures, and structural characteristics of Boolean functions; and application-driven work deploying these ideas in domains from hardware design to biological network modeling. Works like Quantum Boolean Learning[1] and Quantum Learning Robust[6] illustrate the quantum branch, while classical approaches range from hardness results such as Statistical Query Lower Bounds[2] to neural methods like Boolformer[23] and Differentiable Matricized DNFs[17]. Several active lines of research reveal key trade-offs and open questions. One prominent theme contrasts theoretical hardness results—showing fundamental limits on learning certain function classes in noisy settings—with practical neural network methods that empirically succeed on structured Boolean tasks. Another tension appears between control-theoretic perspectives on Boolean network stability (e.g., Robust Cluster Synchronization[3], Robust Flipping Stabilization[5]) and learning-theoretic views emphasizing generalization from noisy samples. Trapped by Simplicity[0] sits within the neural network learning cluster, closely aligned with Simplicity Bias Transformers[14], both examining how transformer architectures exhibit inductive biases toward simpler Boolean functions. While Simplicity Bias Transformers[14] characterizes this bias more generally, Trapped by Simplicity[0] emphasizes the implications for noise robustness, exploring how simplicity preferences interact with feature corruption—a question that bridges the classical learning and theoretical foundations branches.

Claimed Contributions

Empirical demonstration of transformers' mixed success at noise-robust learning

Can Refute

2 retrieved papers

The authors empirically demonstrate that transformers can learn sparse parity and odd-length majority functions from noisy training data, outperforming LSTMs. However, transformers fail to learn random k-juntas robustly despite achieving near-optimal validation accuracy on noisy data.

2 retrieved papers

Can Refute

Explanation linking simplicity bias to noise-robust learning failure

Can Refute

10 retrieved papers

The authors propose that transformers fail at noise-robust learning because their simplicity bias leads them to prefer low-sensitivity solutions, while the optimal predictor for noisy data typically has lower sensitivity than the target function for random boolean functions.

10 retrieved papers

Can Refute

Demonstration of trapping transformers and escape via sensitivity penalty

10 retrieved papers

The authors design a controlled experiment showing that transformers can be trapped into learning a simpler incorrect function when it achieves similar noisy validation accuracy as the target. They demonstrate that adding a sensitivity penalty to the loss function can enable transformers to escape this trap under certain conditions.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[14] Simplicity bias in transformers and their ability to learn sparse boolean functions PDF

Satwik Bhattamishra, Arkil Patel, Varun Kanade, Phil Blunsom (2023)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Empirical demonstration of transformers' mixed success at noise-robust learning

[14] Simplicity bias in transformers and their ability to learn sparse boolean functions PDF

Can Refute

[51] TRAPPED BY SIMPLICITY: WHEN TRANSFORMERS PDF

Cannot Refute

Contribution

Explanation linking simplicity bias to noise-robust learning failure

[65] Simplicity Bias of Transformers to Learn Low Sensitivity Functions PDF

Can Refute

[62] The Pitfalls of Simplicity Bias in Neural Networks PDF

Cannot Refute

[63] A distributional simplicity bias in the learning dynamics of transformers PDF

Cannot Refute

[64] Simplicity Bias in 1-Hidden Layer Neural Networks PDF

Cannot Refute

[66] Simplicity bias of SGD via sharpness minimization PDF

Cannot Refute

[67] Feature reconstruction from outputs can mitigate simplicity bias in neural networks PDF

Cannot Refute

[68] Mitigating Simplicity Bias in Deep Learning for Improved OOD Generalization and Robustness PDF

Cannot Refute

[69] Using noise to infer aspects of simplicity without learning PDF

Cannot Refute

[70] Evading the simplicity bias: Training a diverse set of models discovers solutions with superior ood generalization PDF

Cannot Refute

[71] The shape and simplicity biases of adversarially robust imagenet-trained cnns PDF

Cannot Refute

Contribution

Demonstration of trapping transformers and escape via sensitivity penalty

[52] A Closer Look at Memorization in Deep Networks PDF

Cannot Refute

[53] Implicit self-regularization in deep neural networks: Evidence from random matrix theory and implications for learning PDF

Cannot Refute

[54] Early-learning regularization prevents memorization of noisy labels PDF

Cannot Refute

[55] A survey of regularization strategies for deep models PDF

Cannot Refute

[56] Avoiding overfitting: A survey on regularization methods for convolutional neural networks PDF

Cannot Refute

[57] Regularizing class-wise predictions via self-knowledge distillation PDF

Cannot Refute

[58] Simple and effective regularization methods for training on noisily labeled data with generalization guarantee PDF

Cannot Refute

[59] ARNOR: Attention regularization based noise reduction for distant supervision relation classification PDF

Cannot Refute

[60] AutoDropout: Learning Dropout Patterns to Regularize Deep Networks PDF

Cannot Refute

[61] Why neural networks find simple solutions: the many regularizers of geometric complexity PDF

Cannot Refute

Trapped by simplicity: When Transformers fail to learn from noisy features

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[14] Simplicity bias in transformers and their ability to learn sparse boolean functions PDF

Contribution Analysis

Empirical demonstration of transformers' mixed success at noise-robust learning

[14] Simplicity bias in transformers and their ability to learn sparse boolean functions PDF

[51] TRAPPED BY SIMPLICITY: WHEN TRANSFORMERS PDF

Explanation linking simplicity bias to noise-robust learning failure

[65] Simplicity Bias of Transformers to Learn Low Sensitivity Functions PDF

[62] The Pitfalls of Simplicity Bias in Neural Networks PDF

[63] A distributional simplicity bias in the learning dynamics of transformers PDF

[64] Simplicity Bias in 1-Hidden Layer Neural Networks PDF

[66] Simplicity bias of SGD via sharpness minimization PDF

[67] Feature reconstruction from outputs can mitigate simplicity bias in neural networks PDF

[68] Mitigating Simplicity Bias in Deep Learning for Improved OOD Generalization and Robustness PDF

[69] Using noise to infer aspects of simplicity without learning PDF

[70] Evading the simplicity bias: Training a diverse set of models discovers solutions with superior ood generalization PDF

[71] The shape and simplicity biases of adversarially robust imagenet-trained cnns PDF

Demonstration of trapping transformers and escape via sensitivity penalty

[52] A Closer Look at Memorization in Deep Networks PDF

[53] Implicit self-regularization in deep neural networks: Evidence from random matrix theory and implications for learning PDF

[54] Early-learning regularization prevents memorization of noisy labels PDF

[55] A survey of regularization strategies for deep models PDF

[56] Avoiding overfitting: A survey on regularization methods for convolutional neural networks PDF

[57] Regularizing class-wise predictions via self-knowledge distillation PDF

[58] Simple and effective regularization methods for training on noisily labeled data with generalization guarantee PDF

[59] ARNOR: Attention regularization based noise reduction for distant supervision relation classification PDF

[60] AutoDropout: Learning Dropout Patterns to Regularize Deep Networks PDF

[61] Why neural networks find simple solutions: the many regularizers of geometric complexity PDF

Table of Contents