Trapped by simplicity: When Transformers fail to learn from noisy features
Overview
Overall Novelty Assessment
The paper investigates whether transformers trained on noisy Boolean features can generalize to clean inputs, focusing on k-sparse parity, majority, and random k-junta functions. It resides in the 'Transformer Simplicity Bias and Noise Robustness' leaf, which contains only two papers total. This represents a sparse, emerging research direction within the broader 'Neural Network Learning of Boolean Functions' branch. The limited population of this leaf suggests the specific intersection of transformer architectures, simplicity bias, and noise-robust Boolean learning remains relatively unexplored compared to adjacent areas like symbolic learning or theoretical complexity analysis.
The taxonomy reveals neighboring work in symbolic regression and sparse polynomial learning, which pursue interpretable Boolean formulas through non-neural methods, and explainable neural approaches like Boolformer that learn interpretable DNFs. The paper diverges from these by examining inductive biases rather than interpretability mechanisms. It also connects to theoretical foundations studying noise sensitivity and approximation properties of Boolean functions, though those works analyze intrinsic function characteristics rather than neural network learning dynamics. The scope note for this leaf explicitly excludes symbolic regression, positioning the work as distinctly neural-centric within the classical learning paradigm.
Among the three contributions analyzed, the empirical demonstration examined two candidates with one appearing to provide overlapping prior work, while the simplicity bias explanation examined ten candidates with one potential refutation. The sensitivity penalty intervention examined ten candidates with none clearly refuting it. These statistics reflect a limited search scope of twenty-two total candidates, not an exhaustive literature review. The first two contributions show some prior overlap within this constrained sample, suggesting related empirical observations or theoretical frameworks exist, while the third contribution appears more distinctive among the examined papers.
Based on the limited search covering top semantic matches, the work appears to occupy a sparsely populated research direction with some conceptual overlap in explaining transformer behavior on Boolean tasks. The analysis does not cover the full landscape of transformer learning theory or Boolean function complexity, focusing instead on papers semantically proximate to noise robustness and simplicity bias. The taxonomy structure indicates this specific intersection remains less crowded than adjacent areas like quantum learning or Boolean network control.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors empirically demonstrate that transformers can learn sparse parity and odd-length majority functions from noisy training data, outperforming LSTMs. However, transformers fail to learn random k-juntas robustly despite achieving near-optimal validation accuracy on noisy data.
The authors propose that transformers fail at noise-robust learning because their simplicity bias leads them to prefer low-sensitivity solutions, while the optimal predictor for noisy data typically has lower sensitivity than the target function for random boolean functions.
The authors design a controlled experiment showing that transformers can be trapped into learning a simpler incorrect function when it achieves similar noisy validation accuracy as the target. They demonstrate that adding a sensitivity penalty to the loss function can enable transformers to escape this trap under certain conditions.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[14] Simplicity bias in transformers and their ability to learn sparse boolean functions PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
Empirical demonstration of transformers' mixed success at noise-robust learning
The authors empirically demonstrate that transformers can learn sparse parity and odd-length majority functions from noisy training data, outperforming LSTMs. However, transformers fail to learn random k-juntas robustly despite achieving near-optimal validation accuracy on noisy data.
Explanation linking simplicity bias to noise-robust learning failure
The authors propose that transformers fail at noise-robust learning because their simplicity bias leads them to prefer low-sensitivity solutions, while the optimal predictor for noisy data typically has lower sensitivity than the target function for random boolean functions.
[65] Simplicity Bias of Transformers to Learn Low Sensitivity Functions PDF
[62] The Pitfalls of Simplicity Bias in Neural Networks PDF
[63] A distributional simplicity bias in the learning dynamics of transformers PDF
[64] Simplicity Bias in 1-Hidden Layer Neural Networks PDF
[66] Simplicity bias of SGD via sharpness minimization PDF
[67] Feature reconstruction from outputs can mitigate simplicity bias in neural networks PDF
[68] Mitigating Simplicity Bias in Deep Learning for Improved OOD Generalization and Robustness PDF
[69] Using noise to infer aspects of simplicity without learning PDF
[70] Evading the simplicity bias: Training a diverse set of models discovers solutions with superior ood generalization PDF
[71] The shape and simplicity biases of adversarially robust imagenet-trained cnns PDF
Demonstration of trapping transformers and escape via sensitivity penalty
The authors design a controlled experiment showing that transformers can be trapped into learning a simpler incorrect function when it achieves similar noisy validation accuracy as the target. They demonstrate that adding a sensitivity penalty to the loss function can enable transformers to escape this trap under certain conditions.