Weight-Space Linear Recurrent Neural Networks

ICLR 2026 Conference SubmissionAnonymous Authors
physics-informed machine learningweight-space learningmeta-learningdeep sequence modelinglinear recurrencetest-time training
Abstract:

We introduce WARP (Weight-space Adaptive Recurrent Prediction), a simple yet powerful model that unifies weight-space learning with linear recurrence to redefine sequence modeling. Unlike conventional recurrent neural networks (RNNs) which collapse temporal dynamics into fixed-dimensional hidden states, WARP explicitly parametrizes its hidden state as the weights and biases of a distinct auxiliary neural network, and uses input differences to drive its recurrence. This brain-inspired formulation enables efficient gradient-free adaptation of the auxiliary network at test-time, in-context learning abilities, and seamless integration of domain-specific physical priors. Empirical validation shows that WARP matches or surpasses state-of-the-art baselines on diverse classification tasks, featuring in the top three in 4 out of 6 real-world challenging datasets. Furthermore, extensive experiments across sequential image completion, multivariate time series forecasting, and dynamical system reconstruction demonstrate its expressiveness and generalization capabilities. Remarkably, a physics-informed variant of our model outperforms the next best model by more than 10x. Ablation studies confirm the architectural necessity of key components, solidifying weight-space linear RNNs as a transformative paradigm for adaptive machine intelligence.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes WARP, a model that parametrizes recurrent hidden states as weights and biases of an auxiliary neural network, driven by input differences through linear recurrence. Within the taxonomy, it resides in the 'Weight-Space and Meta-Learning Recurrence' leaf under 'Linear Recurrent Architectures and Mechanisms'. This leaf contains only three papers total, including WARP itself and two siblings (MesaNet and Longhorn), indicating a relatively sparse and emerging research direction compared to more crowded areas like 'State Space Models' (four papers) or 'Specialized Sequence Tasks' (six papers).

The taxonomy reveals that WARP's neighboring research directions include 'Gated Linear Recurrence Models' (three papers on gating mechanisms), 'State Space Models and Structured Recurrence' (four papers on Mamba-style architectures), and 'Hybrid Linear-Attention Architectures' (three papers combining recurrence with attention). The 'Representation Learning and Meta-Modeling' branch contains related work on weight-space embeddings and fast weight programming, though these focus on learning representations of weights rather than using weights as recurrent states. WARP bridges meta-modeling concepts with linear recurrence, occupying a distinct position at the intersection of these themes.

Among the eleven candidates examined through limited semantic search, no papers were found to clearly refute any of WARP's three contributions. The first contribution (weight-space linear RNN framework) examined one candidate with no refutation. The second contribution (parallelizable training algorithms) examined zero candidates, leaving its novelty unassessed within this search scope. The third contribution (benchmark performance) examined ten candidates, none providing overlapping prior work. This suggests that within the limited search radius, WARP's approach appears relatively distinct, though the small candidate pool (eleven total) means substantial related work may exist beyond this scope.

Based on the limited literature search covering eleven candidates from semantic similarity, WARP appears to occupy a sparsely populated research niche. The taxonomy structure confirms that weight-space recurrence remains an emerging area with few direct comparisons. However, the restricted search scope—examining only top-K semantic matches rather than exhaustive citation networks—means this assessment reflects local novelty within the examined neighborhood rather than comprehensive field coverage. Broader exploration of meta-learning and neural ODE literature could reveal additional relevant prior work.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
11
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: sequence modeling with weight-space linear recurrence. The field encompasses a diverse set of approaches that leverage linear recurrence relations—either in activation space or in weight space—to process sequential data efficiently. The taxonomy organizes this landscape into several main branches: Linear Recurrent Architectures and Mechanisms explores core model designs such as state-space models (Mamba[4]), gated variants (Griffin[1], GateLoop[5]), and linear recurrent units (Linear Recurrent Units[2], Behavior-dependent LRU[3]); Training and Optimization Methods addresses how these models are learned; Theoretical Foundations and Analysis investigates expressiveness and convergence properties (Universality Linear Recurrences[34]); Domain-Specific Applications targets tasks like time series forecasting and speech recognition; Efficiency and Deployment Optimization focuses on hardware-aware implementations; Representation Learning and Meta-Modeling examines higher-order abstractions over network parameters (Universal Neural Functionals[40], Scalable Weight Space[13]); and Auxiliary Methods and Baselines provide comparative benchmarks and hybrid designs. A particularly active line of work centers on architectures that balance expressiveness with computational efficiency, contrasting traditional gated recurrences (Resurrecting RNNs[8]) with newer linear-time mechanisms (Mamba[4], Griffin[1]) and exploring bidirectional processing (Bidirectional Linear Recurrent[9]). Another emerging theme is meta-learning and weight-space recurrence, where models operate on or generate parameters of other networks rather than directly on input sequences. Weight-Space Linear RNN[0] sits squarely within this meta-modeling branch, sharing conceptual ground with MesaNet[20] and Longhorn[24], which also treat network weights as evolving sequences. Compared to these neighbors, Weight-Space Linear RNN[0] emphasizes linear recurrence dynamics in parameter space, offering a distinct angle on how recurrent structure can be embedded at the level of model weights rather than activations. This positioning highlights ongoing questions about where recurrence should be applied—whether in feature representations, in gating mechanisms, or in the parameterization itself—and how such choices affect both learning dynamics and generalization.

Claimed Contributions

Weight-space linear RNN framework with input-difference-driven recurrence

The authors introduce a novel framework that parametrizes RNN hidden states as weights of an auxiliary neural network and uses input differences (rather than raw inputs) to drive linear recurrence. This design combines the efficiency of linear recurrence with the expressivity of non-linear decoding, and is claimed to be the first to treat weight-space features as intermediate hidden state representations in a recurrence.

1 retrieved paper
Two parallelizable training algorithms enabling gradient-free adaptation, in-context learning, and physics-informed modeling

The authors present two training modes (convolutional and recurrent) that unlock three practical capabilities: gradient-free adaptation of the auxiliary network at test-time, in-context learning without parameter finetuning, and seamless integration of domain-specific physical priors. A physics-informed variant (WARP-Phys) is shown to achieve an order of magnitude lower error on dynamical system reconstruction tasks.

0 retrieved papers
Extensive real-world benchmark suite demonstrating state-of-the-art performance on multivariate time series classification

The authors establish a comprehensive evaluation suite spanning classification, reconstruction, adaptation, and memory tasks. Their model achieves top-three performance on five out of six challenging multivariate time series classification datasets, demonstrating competitive or superior results compared to established RNNs, state-space models, and Transformers on tasks requiring both short- and long-range dependency modeling.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Weight-space linear RNN framework with input-difference-driven recurrence

The authors introduce a novel framework that parametrizes RNN hidden states as weights of an auxiliary neural network and uses input differences (rather than raw inputs) to drive linear recurrence. This design combines the efficiency of linear recurrence with the expressivity of non-linear decoding, and is claimed to be the first to treat weight-space features as intermediate hidden state representations in a recurrence.

Contribution

Two parallelizable training algorithms enabling gradient-free adaptation, in-context learning, and physics-informed modeling

The authors present two training modes (convolutional and recurrent) that unlock three practical capabilities: gradient-free adaptation of the auxiliary network at test-time, in-context learning without parameter finetuning, and seamless integration of domain-specific physical priors. A physics-informed variant (WARP-Phys) is shown to achieve an order of magnitude lower error on dynamical system reconstruction tasks.

Contribution

Extensive real-world benchmark suite demonstrating state-of-the-art performance on multivariate time series classification

The authors establish a comprehensive evaluation suite spanning classification, reconstruction, adaptation, and memory tasks. Their model achieves top-three performance on five out of six challenging multivariate time series classification datasets, demonstrating competitive or superior results compared to established RNNs, state-space models, and Transformers on tasks requiring both short- and long-range dependency modeling.

Weight-Space Linear Recurrent Neural Networks | Novelty Validation