PERK: Long-Context Reasoning as Parameter-Efficient Test-Time Learning

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

test-time learninglong-context reasoningmeta-learningreasoning algorithmlength extrapolation

Long-context reasoning requires accurately identifying relevant information in extensive, noisy input contexts. In this work, we propose PERK (Parameter Efficient Reasoning over Knowledge), a scalable approach for learning to encode long contexts using gradient updates at test time. Specifically, PERK employs two nested optimization loops in a meta-training phase. The inner loop rapidly encodes contexts into a low-rank adapter (LoRA) that serves as a parameter-efficient memory module for the base model. Concurrently, the outer loop learns to use the updated adapter to accurately recall and reason over relevant information from the encoded long context. Our evaluations on several long-context reasoning tasks show that PERK significantly outperforms the standard long-context finetuning, achieving average absolute performance gains of up to 20% for Qwen-2.5 (0.5B & 7B) on synthetic and real-world long-context reasoning. PERK also maintains its advantages across model scales and families. Compared to specialized long-context LLMs, PERK matches or surpasses their performance. Finally, our analyses show PERK is more robust to reasoning complexity, length extrapolation, and the positions of relevant information in contexts.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces PERK, a meta-learning framework that encodes long contexts into low-rank adapters via nested optimization loops at test time. According to the taxonomy, PERK resides in the 'Meta-Learned Test-Time Optimization' leaf under 'Parameter-Efficient Test-Time Learning'. This leaf contains only two papers total, including PERK itself and one sibling work. This positioning suggests a relatively sparse research direction within the broader test-time adaptation landscape, indicating that meta-learned approaches to test-time parameter updates for long-context reasoning remain underexplored compared to other strategies like direct training or retrieval-based methods.

The taxonomy reveals that PERK's immediate neighbors include 'Direct Test-Time Training via Next-Token Prediction' (three papers) and 'Specialized Test-Time Adaptation for Dialogue and Retrieval' (two papers), both within the same parent category of parameter-efficient learning. Broader sibling branches include 'Retrieval-Augmented and Dynamic Inference Strategies' and 'Reasoning and Planning with Extended Horizons', which pursue complementary goals through external memory or multi-step deliberation rather than parameter adaptation. The scope notes clarify that PERK's meta-learning distinguishes it from single-loop adaptation methods, while its focus on general long-context reasoning separates it from domain-specific applications like time-series forecasting or robotic control.

Among fourteen candidate papers examined across three contributions, no refutable prior work was identified. The core PERK framework examined ten candidates with zero refutations, truncated gradient unrolling examined four candidates with zero refutations, and the Drops-in-the-Ocean evaluation setting examined zero candidates. This limited search scope—covering top-K semantic matches and citation expansion rather than exhaustive review—suggests that within the examined literature, no directly overlapping prior work was found. However, the small candidate pool means the analysis cannot definitively rule out relevant work outside this sample, particularly given the sparse population of the meta-learned test-time optimization category.

Based on the available signals, PERK appears to occupy a relatively novel position within a sparsely populated research direction. The absence of refutable candidates among fourteen examined papers, combined with only one sibling work in the same taxonomy leaf, suggests limited direct precedent for meta-learned test-time parameter adaptation in long-context reasoning. However, the restricted search scope and the existence of related approaches in neighboring categories (direct training, retrieval-based methods) indicate that a more comprehensive literature review might reveal additional connections or incremental overlaps not captured in this analysis.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: long-context reasoning via test-time parameter adaptation. The field addresses how models can dynamically adjust their parameters or inference strategies when confronted with extended contexts that exceed typical training distributions. The taxonomy reveals several complementary directions: Test-Time Adaptation Mechanisms for Long-Context Processing explores parameter-efficient updates and meta-learned optimization (e.g., PERK[0], End to End TTT[22]), while Retrieval-Augmented and Dynamic Inference Strategies leverage external memory or selective retrieval to manage context length (e.g., Dynamic RAG Caching[16]). Reasoning and Planning with Extended Horizons focuses on multi-step deliberation and search over longer problem trajectories (e.g., Reflective Planning[9], Backtracking Search[23]), and Memory Architectures and Context Regularization investigates how to structure and compress information over time (e.g., Memory Augmented Transformers[10], Working Memory Dialogue[6]). Domain-Specific Test-Time Adaptation Applications demonstrate these ideas in specialized settings such as forecasting or edge deployment (e.g., Test Time Adaptation Forecasting[2], Mobile Edge LLM[5]). A particularly active line of work centers on parameter-efficient test-time learning, where models perform lightweight gradient updates or meta-learned optimization at inference to better handle novel long contexts. PERK[0] exemplifies this approach by combining meta-learning with test-time parameter tuning, positioning itself alongside End to End TTT[22], which also integrates adaptation directly into the inference loop. These methods contrast with retrieval-centric strategies like Dynamic RAG Caching[16] that avoid parameter updates by selectively fetching relevant context, and with reasoning-focused approaches such as Reflective Planning[9] or Slow Thinking Survey[3] that emphasize iterative deliberation rather than weight adaptation. The main trade-off revolves around computational overhead versus adaptability: parameter updates can be costly but offer fine-grained customization, while retrieval and planning methods may scale more gracefully yet rely on the quality of external knowledge or search heuristics. Open questions include how to balance adaptation speed with stability and how to generalize these techniques across diverse reasoning tasks.

Claimed Contributions

PERK: Parameter-Efficient Reasoning over Knowledge framework

10 retrieved papers

The authors introduce PERK, a method that reframes long-context reasoning as test-time learning. It uses nested optimization loops where an inner loop encodes contexts into a low-rank adapter (LoRA) serving as parameter-efficient memory, while an outer loop learns to recall and reason over the encoded information.

10 retrieved papers

Truncated gradient unrolling for scalable meta-learning

4 retrieved papers

The authors develop a truncated gradient unrolling technique that backpropagates only through the final few inner-loop steps rather than the complete optimization trajectory. This substantially reduces memory overhead while enabling PERK to scale to larger models and longer contexts.

4 retrieved papers

Drops-in-the-Ocean evaluation setting

0 retrieved papers

The authors propose Drops-in-the-Ocean (DIO), a novel long-context evaluation setting where relevant information is distributionally similar to distractors, addressing limitations of Needle-in-a-Haystack benchmarks where target information is stylistically distinct and easier to identify.

0 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[22] End-to-End Test-Time Training for Long Context PDF

Arnuv Tandon, Karan Dalal, Xinhao Li, Daniel Koceja, Marcel RÃ¸d, Sam Buchanan, Xiaolong Wang, Jure Leskovec, Sanmi Koyejo, Tatsunori Hashimoto, Carlos Guestrin, Jed McCaleb, Yejin Choi, Yu Sun (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

PERK: Parameter-Efficient Reasoning over Knowledge framework

[18] Exploring The Effectiveness of Test Time Learning In LLMs for Long Contexts PDF

Cannot Refute

[30] Streamadapter: Efficient test time adaptation from contextual streams PDF

Cannot Refute

[31] Continual Sequence Generation with Adaptive Compositional Modules PDF

Cannot Refute

[32] Learning visual conditioning tokens to correct domain shift for fully test-time adaptation PDF

Cannot Refute

[33] Extending Whisper with Prompt Tuning to Target-Speaker ASR PDF

Cannot Refute

[34] LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation PDF

Cannot Refute

[35] ArcAligner: Adaptive Recursive Aligner for Compressed Context Embeddings in RAG PDF

Cannot Refute

[36] Efficient Long-Form Speech Recognition for General Speech In-Context Learning PDF

Cannot Refute

[37] LiteByte: Efficient and Fast-Adapting MLPs for Online Byte-Level Prediction PDF

Cannot Refute

[38] Dynamic Rank Reinforcement Learning for Adaptive Low-Rank Multi-Head Self Attention in Large Language Models PDF

Cannot Refute

Contribution

Truncated gradient unrolling for scalable meta-learning

[26] Unify ml4tsp: Drawing methodological principles for tsp and beyond from streamlined design space of learning and search PDF

Cannot Refute

[27] Unbiased gradient estimation in unrolled computation graphs with persistent evolution strategies PDF

Cannot Refute

[28] Fourier Model Agnostic Meta-Reinforcement Learning Network PDF

Cannot Refute

[29] Fourier Model Agnostic PDF

Cannot Refute

Contribution

PERK: Long-Context Reasoning as Parameter-Efficient Test-Time Learning

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[22] End-to-End Test-Time Training for Long Context PDF

Contribution Analysis

PERK: Parameter-Efficient Reasoning over Knowledge framework

[18] Exploring The Effectiveness of Test Time Learning In LLMs for Long Contexts PDF

[30] Streamadapter: Efficient test time adaptation from contextual streams PDF

[31] Continual Sequence Generation with Adaptive Compositional Modules PDF

[32] Learning visual conditioning tokens to correct domain shift for fully test-time adaptation PDF

[33] Extending Whisper with Prompt Tuning to Target-Speaker ASR PDF

[34] LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation PDF

[35] ArcAligner: Adaptive Recursive Aligner for Compressed Context Embeddings in RAG PDF

[36] Efficient Long-Form Speech Recognition for General Speech In-Context Learning PDF

[37] LiteByte: Efficient and Fast-Adapting MLPs for Online Byte-Level Prediction PDF

[38] Dynamic Rank Reinforcement Learning for Adaptive Low-Rank Multi-Head Self Attention in Large Language Models PDF

Truncated gradient unrolling for scalable meta-learning

[26] Unify ml4tsp: Drawing methodological principles for tsp and beyond from streamlined design space of learning and search PDF

[27] Unbiased gradient estimation in unrolled computation graphs with persistent evolution strategies PDF

[28] Fourier Model Agnostic Meta-Reinforcement Learning Network PDF

[29] Fourier Model Agnostic PDF

Drops-in-the-Ocean evaluation setting

Table of Contents