PERK: Long-Context Reasoning as Parameter-Efficient Test-Time Learning

ICLR 2026 Conference SubmissionAnonymous Authors
test-time learninglong-context reasoningmeta-learningreasoning algorithmlength extrapolation
Abstract:

Long-context reasoning requires accurately identifying relevant information in extensive, noisy input contexts. In this work, we propose PERK (Parameter Efficient Reasoning over Knowledge), a scalable approach for learning to encode long contexts using gradient updates at test time. Specifically, PERK employs two nested optimization loops in a meta-training phase. The inner loop rapidly encodes contexts into a low-rank adapter (LoRA) that serves as a parameter-efficient memory module for the base model. Concurrently, the outer loop learns to use the updated adapter to accurately recall and reason over relevant information from the encoded long context. Our evaluations on several long-context reasoning tasks show that PERK significantly outperforms the standard long-context finetuning, achieving average absolute performance gains of up to 20% for Qwen-2.5 (0.5B & 7B) on synthetic and real-world long-context reasoning. PERK also maintains its advantages across model scales and families. Compared to specialized long-context LLMs, PERK matches or surpasses their performance. Finally, our analyses show PERK is more robust to reasoning complexity, length extrapolation, and the positions of relevant information in contexts.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces PERK, a meta-learning framework that encodes long contexts into low-rank adapters via nested optimization loops at test time. According to the taxonomy, PERK resides in the 'Meta-Learned Test-Time Optimization' leaf under 'Parameter-Efficient Test-Time Learning'. This leaf contains only two papers total, including PERK itself and one sibling work. This positioning suggests a relatively sparse research direction within the broader test-time adaptation landscape, indicating that meta-learned approaches to test-time parameter updates for long-context reasoning remain underexplored compared to other strategies like direct training or retrieval-based methods.

The taxonomy reveals that PERK's immediate neighbors include 'Direct Test-Time Training via Next-Token Prediction' (three papers) and 'Specialized Test-Time Adaptation for Dialogue and Retrieval' (two papers), both within the same parent category of parameter-efficient learning. Broader sibling branches include 'Retrieval-Augmented and Dynamic Inference Strategies' and 'Reasoning and Planning with Extended Horizons', which pursue complementary goals through external memory or multi-step deliberation rather than parameter adaptation. The scope notes clarify that PERK's meta-learning distinguishes it from single-loop adaptation methods, while its focus on general long-context reasoning separates it from domain-specific applications like time-series forecasting or robotic control.

Among fourteen candidate papers examined across three contributions, no refutable prior work was identified. The core PERK framework examined ten candidates with zero refutations, truncated gradient unrolling examined four candidates with zero refutations, and the Drops-in-the-Ocean evaluation setting examined zero candidates. This limited search scope—covering top-K semantic matches and citation expansion rather than exhaustive review—suggests that within the examined literature, no directly overlapping prior work was found. However, the small candidate pool means the analysis cannot definitively rule out relevant work outside this sample, particularly given the sparse population of the meta-learned test-time optimization category.

Based on the available signals, PERK appears to occupy a relatively novel position within a sparsely populated research direction. The absence of refutable candidates among fourteen examined papers, combined with only one sibling work in the same taxonomy leaf, suggests limited direct precedent for meta-learned test-time parameter adaptation in long-context reasoning. However, the restricted search scope and the existence of related approaches in neighboring categories (direct training, retrieval-based methods) indicate that a more comprehensive literature review might reveal additional connections or incremental overlaps not captured in this analysis.

Taxonomy

Core-task Taxonomy Papers
25
3
Claimed Contributions
14
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: long-context reasoning via test-time parameter adaptation. The field addresses how models can dynamically adjust their parameters or inference strategies when confronted with extended contexts that exceed typical training distributions. The taxonomy reveals several complementary directions: Test-Time Adaptation Mechanisms for Long-Context Processing explores parameter-efficient updates and meta-learned optimization (e.g., PERK[0], End to End TTT[22]), while Retrieval-Augmented and Dynamic Inference Strategies leverage external memory or selective retrieval to manage context length (e.g., Dynamic RAG Caching[16]). Reasoning and Planning with Extended Horizons focuses on multi-step deliberation and search over longer problem trajectories (e.g., Reflective Planning[9], Backtracking Search[23]), and Memory Architectures and Context Regularization investigates how to structure and compress information over time (e.g., Memory Augmented Transformers[10], Working Memory Dialogue[6]). Domain-Specific Test-Time Adaptation Applications demonstrate these ideas in specialized settings such as forecasting or edge deployment (e.g., Test Time Adaptation Forecasting[2], Mobile Edge LLM[5]). A particularly active line of work centers on parameter-efficient test-time learning, where models perform lightweight gradient updates or meta-learned optimization at inference to better handle novel long contexts. PERK[0] exemplifies this approach by combining meta-learning with test-time parameter tuning, positioning itself alongside End to End TTT[22], which also integrates adaptation directly into the inference loop. These methods contrast with retrieval-centric strategies like Dynamic RAG Caching[16] that avoid parameter updates by selectively fetching relevant context, and with reasoning-focused approaches such as Reflective Planning[9] or Slow Thinking Survey[3] that emphasize iterative deliberation rather than weight adaptation. The main trade-off revolves around computational overhead versus adaptability: parameter updates can be costly but offer fine-grained customization, while retrieval and planning methods may scale more gracefully yet rely on the quality of external knowledge or search heuristics. Open questions include how to balance adaptation speed with stability and how to generalize these techniques across diverse reasoning tasks.

Claimed Contributions

PERK: Parameter-Efficient Reasoning over Knowledge framework

The authors introduce PERK, a method that reframes long-context reasoning as test-time learning. It uses nested optimization loops where an inner loop encodes contexts into a low-rank adapter (LoRA) serving as parameter-efficient memory, while an outer loop learns to recall and reason over the encoded information.

10 retrieved papers
Truncated gradient unrolling for scalable meta-learning

The authors develop a truncated gradient unrolling technique that backpropagates only through the final few inner-loop steps rather than the complete optimization trajectory. This substantially reduces memory overhead while enabling PERK to scale to larger models and longer contexts.

4 retrieved papers
Drops-in-the-Ocean evaluation setting

The authors propose Drops-in-the-Ocean (DIO), a novel long-context evaluation setting where relevant information is distributionally similar to distractors, addressing limitations of Needle-in-a-Haystack benchmarks where target information is stylistically distinct and easier to identify.

0 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

PERK: Parameter-Efficient Reasoning over Knowledge framework

The authors introduce PERK, a method that reframes long-context reasoning as test-time learning. It uses nested optimization loops where an inner loop encodes contexts into a low-rank adapter (LoRA) serving as parameter-efficient memory, while an outer loop learns to recall and reason over the encoded information.

Contribution

Truncated gradient unrolling for scalable meta-learning

The authors develop a truncated gradient unrolling technique that backpropagates only through the final few inner-loop steps rather than the complete optimization trajectory. This substantially reduces memory overhead while enabling PERK to scale to larger models and longer contexts.

Contribution

Drops-in-the-Ocean evaluation setting

The authors propose Drops-in-the-Ocean (DIO), a novel long-context evaluation setting where relevant information is distributionally similar to distractors, addressing limitations of Needle-in-a-Haystack benchmarks where target information is stylistically distinct and easier to identify.

PERK: Long-Context Reasoning as Parameter-Efficient Test-Time Learning | Novelty Validation