Weight-Space Linear Recurrent Neural Networks

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.2 Download Report PDF

physics-informed machine learningweight-space learningmeta-learningdeep sequence modelinglinear recurrencetest-time training

We introduce WARP (Weight-space Adaptive Recurrent Prediction), a simple yet powerful model that unifies weight-space learning with linear recurrence to redefine sequence modeling. Unlike conventional recurrent neural networks (RNNs) which collapse temporal dynamics into fixed-dimensional hidden states, WARP explicitly parametrizes its hidden state as the weights and biases of a distinct auxiliary neural network, and uses input differences to drive its recurrence. This brain-inspired formulation enables efficient gradient-free adaptation of the auxiliary network at test-time, in-context learning abilities, and seamless integration of domain-specific physical priors. Empirical validation shows that WARP matches or surpasses state-of-the-art baselines on diverse classification tasks, featuring in the top three in 4 out of 6 real-world challenging datasets. Furthermore, extensive experiments across sequential image completion, multivariate time series forecasting, and dynamical system reconstruction demonstrate its expressiveness and generalization capabilities. Remarkably, a physics-informed variant of our model outperforms the next best model by more than 10x. Ablation studies confirm the architectural necessity of key components, solidifying weight-space linear RNNs as a transformative paradigm for adaptive machine intelligence.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes WARP, a model that parametrizes recurrent hidden states as weights and biases of an auxiliary neural network, driven by input differences through linear recurrence. Within the taxonomy, it resides in the 'Weight-Space and Meta-Learning Recurrence' leaf under 'Linear Recurrent Architectures and Mechanisms'. This leaf contains only three papers total, including WARP itself and two siblings (MesaNet and Longhorn), indicating a relatively sparse and emerging research direction compared to more crowded areas like 'State Space Models' (four papers) or 'Specialized Sequence Tasks' (six papers).

The taxonomy reveals that WARP's neighboring research directions include 'Gated Linear Recurrence Models' (three papers on gating mechanisms), 'State Space Models and Structured Recurrence' (four papers on Mamba-style architectures), and 'Hybrid Linear-Attention Architectures' (three papers combining recurrence with attention). The 'Representation Learning and Meta-Modeling' branch contains related work on weight-space embeddings and fast weight programming, though these focus on learning representations of weights rather than using weights as recurrent states. WARP bridges meta-modeling concepts with linear recurrence, occupying a distinct position at the intersection of these themes.

Among the eleven candidates examined through limited semantic search, no papers were found to clearly refute any of WARP's three contributions. The first contribution (weight-space linear RNN framework) examined one candidate with no refutation. The second contribution (parallelizable training algorithms) examined zero candidates, leaving its novelty unassessed within this search scope. The third contribution (benchmark performance) examined ten candidates, none providing overlapping prior work. This suggests that within the limited search radius, WARP's approach appears relatively distinct, though the small candidate pool (eleven total) means substantial related work may exist beyond this scope.

Based on the limited literature search covering eleven candidates from semantic similarity, WARP appears to occupy a sparsely populated research niche. The taxonomy structure confirms that weight-space recurrence remains an emerging area with few direct comparisons. However, the restricted search scope—examining only top-K semantic matches rather than exhaustive citation networks—means this assessment reflects local novelty within the examined neighborhood rather than comprehensive field coverage. Broader exploration of meta-learning and neural ODE literature could reveal additional relevant prior work.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: sequence modeling with weight-space linear recurrence. The field encompasses a diverse set of approaches that leverage linear recurrence relations—either in activation space or in weight space—to process sequential data efficiently. The taxonomy organizes this landscape into several main branches: Linear Recurrent Architectures and Mechanisms explores core model designs such as state-space models (Mamba[4]), gated variants (Griffin[1], GateLoop[5]), and linear recurrent units (Linear Recurrent Units[2], Behavior-dependent LRU[3]); Training and Optimization Methods addresses how these models are learned; Theoretical Foundations and Analysis investigates expressiveness and convergence properties (Universality Linear Recurrences[34]); Domain-Specific Applications targets tasks like time series forecasting and speech recognition; Efficiency and Deployment Optimization focuses on hardware-aware implementations; Representation Learning and Meta-Modeling examines higher-order abstractions over network parameters (Universal Neural Functionals[40], Scalable Weight Space[13]); and Auxiliary Methods and Baselines provide comparative benchmarks and hybrid designs. A particularly active line of work centers on architectures that balance expressiveness with computational efficiency, contrasting traditional gated recurrences (Resurrecting RNNs[8]) with newer linear-time mechanisms (Mamba[4], Griffin[1]) and exploring bidirectional processing (Bidirectional Linear Recurrent[9]). Another emerging theme is meta-learning and weight-space recurrence, where models operate on or generate parameters of other networks rather than directly on input sequences. Weight-Space Linear RNN[0] sits squarely within this meta-modeling branch, sharing conceptual ground with MesaNet[20] and Longhorn[24], which also treat network weights as evolving sequences. Compared to these neighbors, Weight-Space Linear RNN[0] emphasizes linear recurrence dynamics in parameter space, offering a distinct angle on how recurrent structure can be embedded at the level of model weights rather than activations. This positioning highlights ongoing questions about where recurrence should be applied—whether in feature representations, in gating mechanisms, or in the parameterization itself—and how such choices affect both learning dynamics and generalization.

Claimed Contributions

Weight-space linear RNN framework with input-difference-driven recurrence

1 retrieved paper

The authors introduce a novel framework that parametrizes RNN hidden states as weights of an auxiliary neural network and uses input differences (rather than raw inputs) to drive linear recurrence. This design combines the efficiency of linear recurrence with the expressivity of non-linear decoding, and is claimed to be the first to treat weight-space features as intermediate hidden state representations in a recurrence.

1 retrieved paper

Two parallelizable training algorithms enabling gradient-free adaptation, in-context learning, and physics-informed modeling

0 retrieved papers

The authors present two training modes (convolutional and recurrent) that unlock three practical capabilities: gradient-free adaptation of the auxiliary network at test-time, in-context learning without parameter finetuning, and seamless integration of domain-specific physical priors. A physics-informed variant (WARP-Phys) is shown to achieve an order of magnitude lower error on dynamical system reconstruction tasks.

0 retrieved papers

Extensive real-world benchmark suite demonstrating state-of-the-art performance on multivariate time series classification

10 retrieved papers

The authors establish a comprehensive evaluation suite spanning classification, reconstruction, adaptation, and memory tasks. Their model achieves top-three performance on five out of six challenging multivariate time series classification datasets, demonstrating competitive or superior results compared to established RNNs, state-space models, and Transformers on tasks requiring both short- and long-range dependency modeling.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[20] MesaNet: Sequence Modeling by Locally Optimal Test-Time Training PDF

von Oswald, Johannes, Scherrer, Nino, Kobayashi, Seijin, Versari, Luca, yang songlin, Maile, Kaitlin, Meulemans, Alexander, Saurous, Rif A., Lajoie, Guillaume, Frenkel Charlotte, Pascanu, Razvan, Arcas, Blaise AgÃ¼era y, Sacramento, JoÃ£o (2025)

[24] Longhorn: State Space Models are Amortized Online Learners PDF

Liu Bo, Wang Rui, Wu Lemeng, Feng, Yihao, Stone, Peter, Liu Qiang (2024) • International Conference on Learning Representations

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Weight-space linear RNN framework with input-difference-driven recurrence

[51] Difference between memory and prediction in linear recurrent networks. PDF

Cannot Refute

Contribution

Two parallelizable training algorithms enabling gradient-free adaptation, in-context learning, and physics-informed modeling

Contribution

Extensive real-world benchmark suite demonstrating state-of-the-art performance on multivariate time series classification

[6] Back to recurrent processing at the crossroad of transformers and state-space models PDF

Cannot Refute

[52] Chimera: Effectively modeling multivariate time series with 2-dimensional state space models PDF

Cannot Refute

[53] Effectively modeling time series with simple discrete state spaces PDF

Cannot Refute

[54] Knowledge aggregation transformer network for multivariate time series classification PDF

Cannot Refute

[55] Time Series Analysis from Classical Methods to Transformer-Based Approaches: A Review PDF

Cannot Refute

[56] SVP-T: A shape-level variable-position transformer for multivariate time series classification PDF

Cannot Refute

[57] Probabilistic transformer for time series analysis PDF

Cannot Refute

[58] Unveiling the multi-dimensional spatio-temporal fusion transformer (MDSTFT): A revolutionary deep learning framework for enhanced multi-variate time series â¦ PDF

Cannot Refute

[59] Multivariate Classification of fMRI Time Series with Fused Window Transformers PDF

Cannot Refute

[60] Integration of Mamba and Transformer - MAT for Long-Short Range Time Series Forecasting with Application to Weather Dynamics PDF

Cannot Refute

Weight-Space Linear Recurrent Neural Networks

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[20] MesaNet: Sequence Modeling by Locally Optimal Test-Time Training PDF

[24] Longhorn: State Space Models are Amortized Online Learners PDF

Contribution Analysis

Weight-space linear RNN framework with input-difference-driven recurrence

[51] Difference between memory and prediction in linear recurrent networks. PDF

Two parallelizable training algorithms enabling gradient-free adaptation, in-context learning, and physics-informed modeling

Extensive real-world benchmark suite demonstrating state-of-the-art performance on multivariate time series classification

[6] Back to recurrent processing at the crossroad of transformers and state-space models PDF

[52] Chimera: Effectively modeling multivariate time series with 2-dimensional state space models PDF

[53] Effectively modeling time series with simple discrete state spaces PDF

[54] Knowledge aggregation transformer network for multivariate time series classification PDF

[55] Time Series Analysis from Classical Methods to Transformer-Based Approaches: A Review PDF

[56] SVP-T: A shape-level variable-position transformer for multivariate time series classification PDF

[57] Probabilistic transformer for time series analysis PDF

[58] Unveiling the multi-dimensional spatio-temporal fusion transformer (MDSTFT): A revolutionary deep learning framework for enhanced multi-variate time series â¦ PDF

[59] Multivariate Classification of fMRI Time Series with Fused Window Transformers PDF

[60] Integration of Mamba and Transformer - MAT for Long-Short Range Time Series Forecasting with Application to Weather Dynamics PDF

Table of Contents

[58] Unveiling the multi-dimensional spatio-temporal fusion transformer (MDSTFT): A revolutionary deep learning framework for enhanced multi-variate time series â¦ PDF