In-Place Test-Time Training

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.3 Download Report PDF

Test-time TrainingLarge language modelLLM

The static "train then deploy" paradigm fundamentally limits Large Language Models (LLMs) from dynamically adapting their weights in response to continuous streams of new information inherent in real-world tasks. Test-Time Training (TTT) offers a compelling alternative by updating a subset of model parameters (fast weights) at inference time, yet its potential in the current LLM ecosystem is hindered by critical barriers including architectural incompatibility, computational inefficiency and misaligned fast weight objectives for language modeling. In this work, we introduce In-Place Test-Time Training (In-Place TTT), a framework that seamlessly endows LLMs with Test-Time Training ability. In-Place TTT treats the final projection matrix of the ubiquitous MLP blocks as its adaptable fast weights, enabling a ``drop-in" enhancement for LLMs without costly retraining from scratch. Furthermore, we replace TTT's generic reconstruction objective with a tailored, theoretically-grounded objective explicitly aligned with the Next-Token-Prediction task governing autoregressive language modeling. This principled objective, combined with an efficient chunk-wise update mechanism, results in a highly scalable algorithm compatible with context parallelism. Extensive experiments validate our framework's effectiveness: as an in-place enhancement, it enables a 4B-parameter model to achieve superior performance on tasks with contexts up to 128k, and when pretrained from scratch, it consistently outperforms competitive TTT-related approaches. Ablation study results further provide deeper insights on our design choices. Collectively, our results establish In-Place TTT as a promising step towards a paradigm of continual learning in LLMs.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces In-Place Test-Time Training, a framework enabling LLMs to update parameters during inference by treating MLP projection matrices as adaptable fast weights. It resides in the Parameter Update Approaches leaf, which contains six papers including the original work. This leaf sits within Test-Time Adaptation Mechanisms, one of seven major branches in a taxonomy spanning fifty papers across fourteen leaf nodes. The Parameter Update Approaches direction represents a moderately populated research area, focusing on gradient-based weight modifications at test time rather than activation manipulation or external model integration.

The taxonomy reveals neighboring leaves such as Activation-Based Intervention and Auxiliary Model Integration, both under Test-Time Adaptation Mechanisms. Activation-Based Intervention manipulates internal states without weight updates, while Auxiliary Model Integration employs separate lightweight models to guide inference. The paper's approach diverges from these by directly modifying base model parameters in-place. Sibling papers in the same leaf include works like TestTime Learning LLMs and Medadapter, which also update weights but may differ in architectural targets or domain focus. The broader Inference-Time Optimization Strategies branch explores parameter-free methods like prompt engineering and search algorithms, highlighting a fundamental methodological split in the field.

Among twenty-one candidates examined through semantic search and citation expansion, the three contributions show no clear refutation. The In-Place TTT framework examined nine candidates with zero refutable matches, suggesting limited direct overlap in the specific architectural approach of using MLP projection matrices as fast weights. The LM-aligned objective contribution examined ten candidates without refutation, indicating the tailored next-token-prediction objective may represent a novel formulation within the limited search scope. The chunk-wise update mechanism examined only two candidates, reflecting either a sparse research direction or narrow search coverage for this efficiency-focused component.

Based on the limited search scope of twenty-one candidates, the work appears to occupy a distinct position within parameter update approaches, particularly in its architectural integration strategy and objective design. However, the analysis does not cover exhaustive prior work in test-time adaptation or continual learning, and the moderate size of the Parameter Update Approaches leaf suggests active but not overcrowded research activity. The absence of refutable candidates may reflect genuine novelty or limitations in semantic search coverage for this specific combination of techniques.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: test-time training for large language models. This field explores how models can adapt or improve their performance during inference rather than relying solely on pre-training and fine-tuning. The taxonomy organizes the landscape into seven main branches. Test-Time Adaptation Mechanisms focuses on parameter update approaches and dynamic adjustments that modify model weights or internal states at inference, as seen in works like InPlace TestTime Training[0] and TestTime Learning LLMs[1]. Inference-Time Optimization Strategies emphasizes prompt engineering, search methods, and iterative refinement without parameter changes. Training-Based Test-Time Scaling investigates reinforcement learning and self-improvement loops that leverage test-time compute for reasoning tasks, exemplified by approaches such as Scaling TestTime Compute[14] and OpenR[23]. Post-Training Paradigms and Frameworks examine broader methodologies like continual learning and meta-learning that prepare models for test-time adaptation, while Domain-Specific Applications and Computational Efficiency branches address practical deployment concerns. Analysis and Evaluation Frameworks provide benchmarks and theoretical insights, including surveys like PostTraining Scaling Survey[2] and TestTime Scaling Survey[12]. A particularly active line of work contrasts parameter-updating methods with parameter-free inference strategies. Parameter update approaches, such as InPlace TestTime Training[0] and Medadapter[7], directly modify model weights using test examples or domain-specific data, trading computational cost for potentially stronger adaptation. In contrast, methods like Dynamic Evaluation Online[10] and TestTime Training FewShot[35] explore lighter-weight adjustments or prompt-based interventions that preserve the original model. InPlace TestTime Training[0] sits squarely within the parameter update cluster, emphasizing efficient in-place weight modifications during inference. Compared to TestTime Learning LLMs[1], which may explore broader learning signals, and Medadapter[7], which targets medical domain adaptation, InPlace TestTime Training[0] appears to prioritize computational efficiency and minimal overhead while still enabling meaningful model updates. This positioning reflects ongoing debates about the trade-offs between adaptation strength, inference latency, and resource constraints in real-world deployments.

Claimed Contributions

In-Place Test-Time Training framework for LLMs

9 retrieved papers

The authors propose a framework that enables Large Language Models to dynamically update their weights at inference time by repurposing existing MLP blocks as adaptable fast weights. This drop-in enhancement requires no architectural modifications or costly retraining from scratch, addressing the architectural incompatibility barrier of previous TTT methods.

9 retrieved papers

LM-aligned objective for TTT

10 retrieved papers

The authors introduce a novel learning objective that aligns with the Next-Token Prediction goal of language models, replacing the generic reconstruction targets used in prior TTT work. This objective is designed to encourage fast weights to store predictively useful information for autoregressive language modeling.

10 retrieved papers

Efficient chunk-wise update mechanism with context parallelism

2 retrieved papers

The authors develop an efficient chunk-wise update strategy that leverages parallel scan algorithms to enable context parallelism while maintaining strict causal semantics. This design addresses the computational inefficiency of per-token updates in previous TTT methods and enables high throughput on modern accelerators.

2 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[1] Test-Time Learning for Large Language Models PDF

Hu Jinwu, Zhang Zhi-tian, Chen Guohao, Wen Xutao, Shuai Chao, Luo Wei, Xiao, Bin, Li Yuan-Qing, Tan, Mingkui (2025) • International Conference on Machine Learning

[7] Medadapter: Efficient test-time adaptation of large language models towards medical reasoning PDF

Shi, Wenqi, Sun Haotian, Wu Hang, Xu Ran, Yang, Carl, Yu Yue, Zhuang, Yuchen (2024)

[10] Revisiting dynamic evaluation: Online adaptation for large language models PDF

Rannen-Triki, Amal, Bornschein, Jorg, Pascanu, Razvan, Hutter, Marcus, Gyorgy, Andras, Galashov, Alexandre, Teh, Yee Whye, Titsias, Michalis K. (2024)

[35] The surprising effectiveness of test-time training for few-shot learning PDF

AkyÃ¼rek, Ekin, Damani, Mehul, Qiu, Linlu, Guo Han, Pari, Jyothish, Kim, Yoon, Andreas Jacob (2024)

[50] Evaluating Test-Time Training for Conceptual Reasoning in Large Language Models PDF

F Derksen (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution