Don't Throw Away Your Pretrained Model

ICLR 2026 Conference SubmissionAnonymous Authors
model collaborationcollaborative inference
Abstract:

Alignment training has tradeoffs: it helps language models (LMs) gain in reasoning and instruction following but might lose out on skills such as creativity and calibration, where unaligned base models are better at. We aim to make the best of both worlds through model collaboration, where different models in the training pipeline collaborate and complement each other. Since LM responses feature interleaving skills that favor different models, we propose Switch Generation, where pretrained and aligned model versions take turns to ``speak'' in a response sequence. Specifically, we train a switcher LM by learning from outcomes of choosing different models to generate the next segment across diverse queries and contexts. At inference time, the switcher LM guides different model checkpoints to dynamically generate the next segment where their strengths are most needed. Extensive experiments with 8 model collaboration baselines and 18 datasets show that 1) model collaboration consistently outperforms individual models on 16 out of 18 tasks, and 2) Switch Generation further outperforms baselines by 12.9% on average. Further analysis reveals that Switch Generation discovers compositional skills to solve problems where individual models struggle and generalizes to unseen models and tasks, reusing and repurposing by-products in expensive model training pipelines that are otherwise discarded.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes Switch Generation, a method for dynamically alternating between pretrained and aligned language model checkpoints during response generation. It resides in the 'Dynamic Checkpoint Switching and Routing' leaf, which contains only two papers total (including this one). This sparse population suggests the specific problem of segment-level checkpoint switching is relatively underexplored. The sibling paper in this leaf addresses inference-time alignment refinement, indicating the broader theme of runtime model orchestration is emerging but not yet crowded.

The taxonomy reveals that multi-checkpoint collaboration encompasses several neighboring directions: speculative reasoning with smaller models, multi-model entity alignment, and distributed edge-cloud inference. Switch Generation diverges from these by focusing on fine-grained segment-level routing rather than task-level partitioning or entity-level fusion. The taxonomy's scope notes clarify that static model selection and non-dynamic ensembling belong elsewhere, positioning this work at the intersection of dynamic routing and alignment trade-off mitigation—a boundary that appears less saturated than general distributed inference.

Among 24 candidates examined, the switcher LM training methodology encountered one refutable candidate out of 10 examined, while the core Switch Generation algorithm and the checkpoint reuse framework showed no clear refutations across 4 and 10 candidates respectively. This suggests that while the training approach has some overlap with prior routing or selection methods, the segment-level switching mechanism and the specific framing around pretrained-aligned collaboration appear less directly anticipated. The limited search scope (top-K semantic matches) means these statistics reflect immediate neighborhood rather than exhaustive coverage.

Given the sparse taxonomy leaf and the modest refutation rate across contributions, the work appears to occupy a relatively novel position within the examined literature. However, the analysis is constrained by the 24-candidate search scope and does not capture potential overlap in broader model routing or mixture-of-experts literature outside the semantic neighborhood. The taxonomy structure and contribution-level statistics together suggest incremental novelty in methodology but a fresher angle on the pretrained-aligned checkpoint collaboration problem.

Taxonomy

Core-task Taxonomy Papers
15
3
Claimed Contributions
24
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: collaborative inference with pretrained and aligned language model checkpoints. The field encompasses diverse strategies for leveraging multiple model checkpoints or distributed resources to improve inference quality, efficiency, and adaptability. At the top level, one finds branches addressing distributed and edge-cloud setups (e.g., Edge Cloud Collaborative Inference[1]), multi-checkpoint collaboration and dynamic routing (e.g., Pretrained Model[0], Inference Time Alignment[4]), domain-specific and cross-lingual adaptation (e.g., LinguaLinked[11]), privacy-preserving techniques (e.g., Mutual Information Defense[7]), cross-modal foundation models that align vision and language (e.g., Vision Language Quality Assessment[3]), and iterative annotation or pseudo-labeling workflows with LLMs (e.g., Iterative Sentiment Analysis[13]). These branches reflect complementary emphases: some prioritize resource constraints and latency (edge-cloud methods), others focus on combining heterogeneous checkpoints or switching among them at inference time, and still others tackle specialized domains or modalities where alignment is critical. Within the multi-checkpoint collaboration space, a particularly active line of work explores dynamic checkpoint switching and routing, where systems decide at inference time which model or checkpoint to invoke based on input characteristics or intermediate confidence signals. Pretrained Model[0] sits squarely in this branch, emphasizing mechanisms that orchestrate pretrained and aligned checkpoints to balance accuracy and cost. Nearby, Inference Time Alignment[4] investigates how alignment can be refined during inference rather than solely at training, offering a complementary perspective on when and how to apply collaborative strategies. Meanwhile, works like Speculative Chain of Thought[6] and Code Completion Collaboration[10] illustrate how multi-model collaboration extends to reasoning-intensive or code-generation tasks, raising open questions about the trade-offs between routing overhead and the gains from specialized or ensemble-based inference. Overall, Pretrained Model[0] contributes to this landscape by addressing the orchestration challenge inherent in leveraging multiple aligned checkpoints, a theme that resonates across several branches but remains especially salient in dynamic routing scenarios.

Claimed Contributions

SWITCHGENERATION collaborative inference algorithm

The authors introduce SWITCHGENERATION, a collaborative inference method where different model checkpoints (pretrained, finetuned, aligned) dynamically generate successive text segments within a single response. A trained switcher language model decides which checkpoint should generate the next segment based on the query and what has been generated so far.

4 retrieved papers
Switcher LM training methodology

The authors develop a supervised fine-tuning approach for training a small language model (the switcher) that learns to predict which model checkpoint should generate the next text segment. Training data is derived by rolling out different model choices and evaluating their average outcomes across diverse queries and partial responses.

10 retrieved papers
Can Refute
Model collaboration framework reusing training pipeline checkpoints

The authors propose a framework that leverages multiple checkpoints from the standard model training pipeline (pretrained, finetuned, aligned versions) for collaborative inference, rather than discarding earlier checkpoints. This enables complementary strengths across checkpoints to be exploited without additional model training.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

SWITCHGENERATION collaborative inference algorithm

The authors introduce SWITCHGENERATION, a collaborative inference method where different model checkpoints (pretrained, finetuned, aligned) dynamically generate successive text segments within a single response. A trained switcher language model decides which checkpoint should generate the next segment based on the query and what has been generated so far.

Contribution

Switcher LM training methodology

The authors develop a supervised fine-tuning approach for training a small language model (the switcher) that learns to predict which model checkpoint should generate the next text segment. Training data is derived by rolling out different model choices and evaluating their average outcomes across diverse queries and partial responses.

Contribution

Model collaboration framework reusing training pipeline checkpoints

The authors propose a framework that leverages multiple checkpoints from the standard model training pipeline (pretrained, finetuned, aligned versions) for collaborative inference, rather than discarding earlier checkpoints. This enables complementary strengths across checkpoints to be exploited without additional model training.

Don't Throw Away Your Pretrained Model | Novelty Validation