Discovering alternative solutions beyond the simplicity bias in recurrent neural networks

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.3 Download Report PDF

recurrent neural networkscomputational neurosciencedynamical systems

Training recurrent neural networks (RNNs) to perform neuroscience-style tasks has become a popular way to generate hypotheses for how neural circuits in the brain might perform computations. Recent work has demonstrated that task-trained RNNs possess a strong simplicity bias. In particular, this inductive bias often causes RNNs trained on the same task to collapse on effectively the same solution, typically comprised of fixed-point attractors or other low-dimensional dynamical motifs. While such solutions are readily interpretable, this collapse proves counterproductive for the sake of generating a set of genuinely unique hypotheses for how neural computations might be performed. Here we propose Iterative Neural Similarity Deflation (INSD), a simple method to break this inductive bias. By penalizing linear predictivity of neural activity produced by standard task-trained RNNs, we find an alternative class of solutions to classic neuroscience-style RNN tasks. These solutions appear distinct across a battery of analysis techniques, including representational similarity metrics, dynamical systems analysis, and the linear decodability of task-relevant variables. Moreover, these alternative solutions can sometimes achieve superior performance in difficult or out-of-distribution task regimes. Our findings underscore the importance of moving beyond the simplicity bias to uncover richer and more varied models of neural computation.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Iterative Neural Similarity Deflation (INSD), a method to break the simplicity bias in task-trained RNNs by penalizing linear predictivity of neural activity. It sits within the 'Quantifying and Controlling Solution Degeneracy' leaf, which contains only three papers total. This leaf focuses specifically on methods for measuring and manipulating solution degeneracy across behavioral, dynamical, and weight-space levels. The sparse population suggests this is an emerging rather than saturated research direction, with relatively few prior works directly addressing controlled generation of diverse RNN solutions.

The taxonomy reveals that solution diversity research connects to several neighboring areas. The sibling leaf 'Discovery of Multiple Algorithmic Strategies' (three papers) examines qualitatively different algorithms emerging naturally, while 'Universality and Individuality' (one paper) studies shared versus unique representations across RNN populations. The broader 'Mechanistic Interpretation' branch explores dynamical systems analysis and working memory coding strategies—analytical tools that INSD leverages to characterize discovered solutions. The paper bridges degeneracy control methods with mechanistic interpretation techniques, positioning itself at the intersection of generating diversity and analyzing what makes solutions genuinely distinct.

Among thirty candidates examined, the contribution-level analysis shows mixed novelty signals. The INSD method itself (Contribution 1) examined ten candidates with one refutable match, suggesting some methodological overlap exists within the limited search scope. Similarly, discovering alternative solutions beyond simplicity bias (Contribution 2) found one refutable candidate among ten examined. The framework for generating diverse computational hypotheses (Contribution 3) showed no refutable matches across ten candidates, indicating this framing may be more distinctive. The statistics reflect a focused but not exhaustive literature search, leaving open whether additional relevant work exists beyond the top-thirty semantic matches.

Given the sparse taxonomy leaf and limited search scope, the work appears to address a genuine gap in controlled diversity generation for task-trained RNNs. The one-to-two refutable matches per contribution suggest some methodological precedent exists, but the overall scarcity of papers in this specific research direction indicates the problem remains relatively underexplored. The analysis captures top semantic matches but cannot rule out relevant work in adjacent communities or under different terminological framings.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Discovering diverse solutions in task-trained recurrent neural networks. The field explores how RNNs trained on identical tasks can arrive at qualitatively different internal mechanisms—a phenomenon known as solution degeneracy. The taxonomy reflects a broad landscape organized around several major themes. One branch examines solution diversity and degeneracy directly, investigating how multiple distinct circuit-level strategies emerge from the same objective and how researchers can quantify or control this variability (e.g., Alternative Solutions Simplicity[0], Solution Degeneracy Control[25]). A second branch focuses on mechanistic interpretation, dissecting the internal dynamics and representational structure that task-optimized RNNs develop (Cognitive Strategies Discovery[3], Clock Pizza Mechanistic[5]). A third branch draws comparisons between RNN solutions and biological neural systems, asking whether artificial networks recapitulate known neural coding schemes or circuit motifs (Dynamic Coding Memory[6], Nodes to Networks[7]). Additional branches address architecture design, methodological tools for analysis, and a wide range of application domains—from forecasting and classification to computer vision and resource scheduling—demonstrating that the core questions about solution diversity arise across many practical settings. Within the solution diversity and degeneracy branch, a particularly active line of work seeks to chart the space of possible solutions and develop principled ways to sample or steer networks toward simpler or more interpretable configurations. Alternative Solutions Simplicity[0] sits squarely in this cluster, emphasizing methods to discover and compare alternative circuit implementations that solve the same task. Nearby efforts such as Solution Degeneracy Control[25] and Charting Solution Space[29] share a focus on mapping out the landscape of degenerate solutions and understanding the factors—initialization schemes, regularization, or architectural constraints—that bias networks toward one solution over another. A key open question is whether certain solutions generalize better or align more closely with biological plausibility, and how one might systematically favor such solutions during training. By situating itself among these works, Alternative Solutions Simplicity[0] contributes tools for navigating the rich, often redundant space of learned representations, helping researchers move beyond single-solution analyses toward a more complete picture of what task-trained RNNs can learn.

Claimed Contributions

Iterative Neural Similarity Deflation (INSD) method

Can Refute

10 retrieved papers

The authors introduce INSD, a training procedure that penalizes linear predictivity of neural activity from previously trained RNNs in an iterative manner. This method enables discovery of alternative task solutions that diverge from the prototypical solutions typically found due to simplicity bias in RNNs.

10 retrieved papers

Can Refute

Discovery of alternative RNN solutions beyond simplicity bias

Can Refute

10 retrieved papers

The authors demonstrate that their method uncovers a distinct class of solutions to neuroscience tasks that differ from standard solutions in representational geometry, dynamical motifs, and encoding of task variables. These alternative solutions forgo fixed-point attractors and instead maintain information in dynamically evolving subspaces.

10 retrieved papers

Can Refute

Framework for generating diverse computational hypotheses in neuroscience

10 retrieved papers

The authors address the problem of generating multiple competing hypotheses for neural computation by developing a method that overcomes dynamic collapse. Their approach enables production of genuinely unique solutions that can be evaluated against experimental data, moving beyond the limitations of varying only basic hyperparameters.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[25] Measuring and Controlling Solution Degeneracy across Task-Trained Recurrent Neural Networks PDF

Huang Ann, Martinelli, Flavio, Rajan, Kanaka (2024)

[29] Charting and Navigating the Space of Solutions for Recurrent Neural Networks PDF

Turner, Elia, Elia Turner, Dabholkar, Kabir, Kabir Dabholkar, Barak, Omri, Omri Barak (2021) • Neural Information Processing Systems

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Iterative Neural Similarity Deflation (INSD) method

[63] Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization PDF

Can Refute

[61] In-Depth Exploration of the Advantages of Neural Networks in English Machine Translation PDF

Cannot Refute

[62] Input similarity from the neural network perspective PDF

Cannot Refute

[64] A Step-by-Step Gradient Penalty with Similarity Calculation for Text Summary Generation PDF

Cannot Refute

[65] A generative neural network for maximizing fitness and diversity of synthetic DNA and protein sequences PDF

Cannot Refute

[66] Hardware Aware Evolutionary Neural Architecture Search using Representation Similarity Metric PDF

Cannot Refute

[67] A baseline regularization scheme for transfer learning with convolutional neural networks PDF

Cannot Refute

[68] Word embedding based document similarity for the inferring of penalty PDF

Cannot Refute

[69] Beyond Clean Training Data: A Versatile and Model-Agnostic Framework for Out-of-Distribution Detection with Contaminated Training Data PDF

Cannot Refute

[70] Cosine-similarity penalty to discriminate sound classes in weakly-supervised sound event detection PDF

Cannot Refute

Contribution

Discovery of alternative RNN solutions beyond simplicity bias

[54] Persistent learning signals and working memory without continuous attractors PDF

Can Refute

[51] Organizing recurrent network dynamics by task-computation to enable continual learning PDF

Cannot Refute

[52] A novel BDPCA-SMLSTM algorithm for fault diagnosis of industrial process PDF

Cannot Refute

[53] Back to the continuous attractor PDF

Cannot Refute

[55] Beyond exploding and vanishing gradients: analysing RNN training using attractors and smoothness PDF

Cannot Refute

[56] Attractor memory for long-term time series forecasting: A chaos perspective PDF

Cannot Refute

[57] Recurrent neural networks with explicit representation of dynamic latent variables can mimic behavioral patterns in a physical inference task PDF

Cannot Refute

[58] DSTED: A denoising spatialâtemporal encoderâdecoder framework for multistep prediction of burn-through point in sintering process PDF

Cannot Refute

[59] Deep concatenated features with improved heuristic-based recurrent neural network for hyperspectral image classification PDF

Cannot Refute

[60] EnvFormer: A Decomposition-based Transformer for Multi-step Burn-through Point Prediction in Sintering Process PDF

Cannot Refute

Contribution

Framework for generating diverse computational hypotheses in neuroscience

[8] Modeling the dynamics of human brain activity with recurrent neural networks PDF

Cannot Refute

[71] Second-order forward-mode optimization of recurrent neural networks for neuroscience PDF

Cannot Refute

[72] Localizing syntactic predictions using recurrent neural network grammars PDF

Cannot Refute

[73] Reconstructing computational system dynamics from neural data with recurrent neural networks PDF

Cannot Refute

[74] Computational models of multisensory integration with recurrent neural networks: A critical review and future directions PDF

Cannot Refute

[75] Biologically plausible models of cognitive flexibility: merging recurrent neural networks with full-brain dynamics PDF

Cannot Refute

[76] Recurrent neural networks as neuro-computational models of human speech recognition PDF

Cannot Refute

[77] Recurrent neural networks as versatile tools of neuroscience research PDF

Cannot Refute

[78] Computational neuroscience: Recent advancement PDF

Cannot Refute

[79] Inferring brain-wide interactions using data-constrained recurrent neural network models PDF

Cannot Refute

Discovering alternative solutions beyond the simplicity bias in recurrent neural networks

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[25] Measuring and Controlling Solution Degeneracy across Task-Trained Recurrent Neural Networks PDF

[29] Charting and Navigating the Space of Solutions for Recurrent Neural Networks PDF

Contribution Analysis

Iterative Neural Similarity Deflation (INSD) method

[63] Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization PDF

[61] In-Depth Exploration of the Advantages of Neural Networks in English Machine Translation PDF

[62] Input similarity from the neural network perspective PDF

[64] A Step-by-Step Gradient Penalty with Similarity Calculation for Text Summary Generation PDF

[65] A generative neural network for maximizing fitness and diversity of synthetic DNA and protein sequences PDF

[66] Hardware Aware Evolutionary Neural Architecture Search using Representation Similarity Metric PDF

[67] A baseline regularization scheme for transfer learning with convolutional neural networks PDF

[68] Word embedding based document similarity for the inferring of penalty PDF

[69] Beyond Clean Training Data: A Versatile and Model-Agnostic Framework for Out-of-Distribution Detection with Contaminated Training Data PDF

[70] Cosine-similarity penalty to discriminate sound classes in weakly-supervised sound event detection PDF

Discovery of alternative RNN solutions beyond simplicity bias

[54] Persistent learning signals and working memory without continuous attractors PDF

[51] Organizing recurrent network dynamics by task-computation to enable continual learning PDF

[52] A novel BDPCA-SMLSTM algorithm for fault diagnosis of industrial process PDF

[53] Back to the continuous attractor PDF

[55] Beyond exploding and vanishing gradients: analysing RNN training using attractors and smoothness PDF

[56] Attractor memory for long-term time series forecasting: A chaos perspective PDF

[57] Recurrent neural networks with explicit representation of dynamic latent variables can mimic behavioral patterns in a physical inference task PDF

[58] DSTED: A denoising spatialâtemporal encoderâdecoder framework for multistep prediction of burn-through point in sintering process PDF

[59] Deep concatenated features with improved heuristic-based recurrent neural network for hyperspectral image classification PDF

[60] EnvFormer: A Decomposition-based Transformer for Multi-step Burn-through Point Prediction in Sintering Process PDF

Framework for generating diverse computational hypotheses in neuroscience

[8] Modeling the dynamics of human brain activity with recurrent neural networks PDF

[71] Second-order forward-mode optimization of recurrent neural networks for neuroscience PDF

[72] Localizing syntactic predictions using recurrent neural network grammars PDF

[73] Reconstructing computational system dynamics from neural data with recurrent neural networks PDF

[74] Computational models of multisensory integration with recurrent neural networks: A critical review and future directions PDF

[75] Biologically plausible models of cognitive flexibility: merging recurrent neural networks with full-brain dynamics PDF

[76] Recurrent neural networks as neuro-computational models of human speech recognition PDF

[77] Recurrent neural networks as versatile tools of neuroscience research PDF

[78] Computational neuroscience: Recent advancement PDF

[79] Inferring brain-wide interactions using data-constrained recurrent neural network models PDF

Table of Contents

[58] DSTED: A denoising spatialâtemporal encoderâdecoder framework for multistep prediction of burn-through point in sintering process PDF