RAIN-Merging: A Gradient-Free Method to Enhance Instruction Following in Large Reasoning Models with Preserved Thinking Format

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

Large Reasoning ModelInstruction FollowingModel MergingNull-Space

Large reasoning models (LRMs) excel at a long chain of reasoning but often fail to faithfully follow instructions regarding output format, constraints, or specific requirements. We investigate whether this gap can be closed by integrating an instruction-tuned model (ITM) into an LRM. Analyzing their differences in parameter space, namely task vectors, we find that their principal subspaces are nearly orthogonal across key modules, suggesting a lightweight merging with minimal interference. However, we also demonstrate that naïve merges are fragile because they overlook the output format mismatch between LRMs (with explicit thinking and response segments) and ITMs (answers-only). We introduce RAIN-Merging (Reasoning-Aware Instruction-attention guided Null-space projection Merging), a gradient-free method that integrates instruction following while preserving thinking format and reasoning performance. First, with a small reasoning calibration set, we project the ITM task vector onto the null space of forward features at thinking special tokens, which preserves the LRM's structured reasoning mechanisms. Second, using a small instruction calibration set, we estimate instruction attention to derive module-specific scaling that amplifies instruction-relevant components and suppresses leakage. Across four instruction-following benchmarks and nine reasoning & general capability benchmarks, RAIN-Merging substantially improves instruction adherence while maintaining reasoning quality. The gains are consistent across model scales and architectures, translating to improved performance in agent settings.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes RAIN-Merging, a gradient-free method that integrates instruction-following capabilities from instruction-tuned models into large reasoning models through null-space projection and attention-guided merging. It resides in the 'Weight-Space Merging and Task Vector Methods' leaf, which contains only two papers total. This is a notably sparse research direction within the broader taxonomy of 50 papers across 36 topics, suggesting that parameter-space merging techniques for reasoning-instruction integration remain relatively underexplored compared to training-based adaptation methods or inference-time steering approaches.

The taxonomy reveals that most related work clusters in adjacent branches: 'Training-Based Adaptation and Instruction Tuning' contains 18 papers across four sub-areas, while 'Inference-Time Steering and Optimization' includes six papers. The paper's approach diverges from these by avoiding both retraining and inference-time prompting, instead operating directly in weight space. Its sibling paper (Disperse-then-Merge) explores iterative dispersion strategies, whereas RAIN-Merging emphasizes reasoning-aware projection to preserve structured thinking formats. The MoE-Based Integration leaf offers an architectural alternative with two papers, but these require learned routing rather than direct parameter fusion.

Among 29 candidates examined across three contributions, no clearly refuting prior work was identified. The RAIN-Merging method examined 10 candidates with zero refutable matches, the null-space projection technique examined 9 candidates with zero refutable matches, and the instruction-attention guided coefficients examined 10 candidates with zero refutable matches. This suggests that within the limited search scope, the specific combination of reasoning-aware null-space projection with attention-guided merging coefficients appears distinct from examined prior work. However, the search scale of 29 candidates is modest relative to the broader literature on model merging and instruction tuning.

Based on the top-29 semantic matches and the sparse taxonomy leaf containing only one sibling paper, the work appears to occupy a relatively novel position within parameter-space merging for reasoning-instruction integration. The analysis does not cover exhaustive literature on general task vector methods or broader model merging techniques outside the reasoning-instruction context, so the assessment reflects novelty within this specific problem framing rather than across all weight-space merging research.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: merging large reasoning models with instruction-tuned models. The field has organized itself around several complementary strategies for combining the strengths of specialized reasoning capabilities with broad instruction-following behavior. Model Merging and Integration Techniques explore weight-space methods and task vector approaches that directly combine parameters from different expert models, exemplified by works like RAIN-Merging[0] and Disperse-then-Merge[5]. Training-Based Adaptation and Instruction Tuning focus on fine-tuning strategies that teach models to follow instructions while preserving reasoning skills, with approaches ranging from mixture-of-experts architectures such as MoE Instruction Tuning[3] to continual learning methods like Mammoth2[1]. Inference-Time Steering and Optimization investigate dynamic control mechanisms that adjust model behavior during generation without retraining. Meanwhile, Instruction-Following Evaluation and Benchmarking develop metrics and test suites to assess how well merged or adapted models balance reasoning depth with instruction adherence, and Multimodal Instruction Following extends these ideas beyond text to vision and speech domains, as seen in Training-free Multimodal[2]. A particularly active line of work examines the trade-offs between parameter efficiency and capability retention when merging models. Some studies pursue training-free integration to avoid catastrophic forgetting, while others accept modest retraining costs to achieve tighter alignment between reasoning traces and instruction semantics. RAIN-Merging[0] sits squarely within the weight-space merging branch, emphasizing parameter-level fusion techniques that preserve the distinct strengths of reasoning-specialized and instruction-tuned checkpoints. Compared to Disperse-then-Merge[5], which explores iterative dispersion and consolidation strategies, RAIN-Merging[0] focuses on direct task vector arithmetic to achieve a stable blend. This contrasts with training-heavy approaches like Two Experts[4] or MoE Instruction Tuning[3], which rely on learned routing or gating mechanisms. The central open question across these branches remains how to quantify and control the reasoning-instruction trade-off without extensive human evaluation, a challenge that motivates ongoing work in both merging algorithms and evaluation frameworks.

Claimed Contributions

RAIN-Merging method for integrating instruction-following into large reasoning models

10 retrieved papers

The authors propose a two-stage gradient-free merging approach that combines an instruction-tuned model with a large reasoning model. The method uses null-space projection to preserve the reasoning structure and instruction-attention guided coefficients to enhance instruction adherence without requiring gradient-based training.

10 retrieved papers

Reasoning-aware null-space projection technique

9 retrieved papers

The first stage of RAIN-Merging projects the instruction-tuned model task vector onto the null space derived from forward features at thinking tokens. This projection maintains the large reasoning model's thinking format and output distribution while enabling integration of instruction-following capabilities.

9 retrieved papers

Instruction-attention guided merging coefficients

10 retrieved papers

The second stage introduces per-module scaling coefficients based on attention outputs over instruction-related spans. These coefficients strengthen instruction-relevant behaviors by maximizing alignment with instruction tokens while minimizing attention leakage to unrelated content.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[5] Disperse-then-merge: Pushing the limits of instruction tuning via alignment tax reduction PDF

Tingchen Fu, Deng Cai, Lemao Liu, Shuming Shi, Rui Yan (2024)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

RAIN-Merging method for integrating instruction-following into large reasoning models

[45] Where do Reasoning Models make a Difference? Follow the Reasoning Leader for Efficient Decoding PDF

Cannot Refute

[50] TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling PDF

Cannot Refute

[61] Symbolic mixture-of-experts: Adaptive skill-based routing for heterogeneous reasoning PDF

Cannot Refute

[62] Dolphins: Multimodal Language Model for Driving PDF

Cannot Refute

[63] A Review on LLMs for IoT Ecosystem: State-of-the-art, Lightweight Models, Use Cases, Key Challenges, Future Directions PDF

Cannot Refute

[64] Question-instructed visual descriptions for zero-shot video answering PDF

Cannot Refute

[65] MoDE-CoTD: Chain-of-Thought Distillation for Complex Reasoning Tasks with Mixture of Decoupled LoRA-Experts PDF

Cannot Refute

[66] Parametric layer erasure through latent semantic oscillation in instruction-tuned language models PDF

Cannot Refute

[67] Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous Driving and Zero-Shot Instruction Following PDF

Cannot Refute

[68] Particle Swarm Optimization Meets Large Language Models PDF

Cannot Refute

Contribution

Reasoning-aware null-space projection technique

[69] AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models PDF

Cannot Refute

[70] MINGLE: Mixture of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging PDF

Cannot Refute

[71] MINGLE: Mixtures of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging PDF

Cannot Refute

[72] Mitigating negative interference in multilingual knowledge editing through null-space constraints PDF

Cannot Refute

[73] PurifyGen: A Risk-Discrimination and Semantic-Purification Model for Safe Text-to-Image Generation PDF

Cannot Refute

[74] NP-LoRA: Null Space Projection Unifies Subject and Style in LoRA Fusion PDF

Cannot Refute

[75] CaseEdit: Enhancing Localized Commonsense Reasoning via Null-Space Constrained Knowledge Editing in Small Parameter Language Models PDF

Cannot Refute

[76] Null-Space Filtering for Data-Free Continual Model Merging: Preserving Transparency, Promoting Fidelity PDF

Cannot Refute

[77] LoRA-Null: Low-Rank Adaptation via Null Space for Large Language Models PDF

Cannot Refute

Contribution

Instruction-attention guided merging coefficients

[51] Multimodal Attention-Based Instruction-Following Part-Level Affordance Grounding PDF

Cannot Refute

[52] Focus on your instruction: Fine-grained and multi-instruction image editing by attention modulation PDF

Cannot Refute

[53] Spotlight Your Instructions: Instruction-following with Dynamic Attention Steering PDF

Cannot Refute

[54] On the loss of context-awareness in general instruction fine-tuning PDF

Cannot Refute

[55] Scale Your Instructions: Enhance the Instruction-Following Fidelity of Unified Image Generation Model by Self-Adaptive Attention Scaling PDF

Cannot Refute

[56] Hierarchical neuro-symbolic context morphogenesis for large language model reasoning fidelity PDF

Cannot Refute

[57] A Comprehensive Survey on Continual Learning in Generative Models PDF

Cannot Refute

[58] Instruction Following by Boosting Attention of Large Language Models PDF

Cannot Refute

[59] Hunyuan-turbos: Advancing large language models through mamba-transformer synergy and adaptive chain-of-thought PDF

Cannot Refute

[60] Ivtp: Instruction-guided visual token pruning for large vision-language models PDF

Cannot Refute

RAIN-Merging: A Gradient-Free Method to Enhance Instruction Following in Large Reasoning Models with Preserved Thinking Format

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[5] Disperse-then-merge: Pushing the limits of instruction tuning via alignment tax reduction PDF

Contribution Analysis

RAIN-Merging method for integrating instruction-following into large reasoning models

[45] Where do Reasoning Models make a Difference? Follow the Reasoning Leader for Efficient Decoding PDF

[50] TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling PDF

[61] Symbolic mixture-of-experts: Adaptive skill-based routing for heterogeneous reasoning PDF

[62] Dolphins: Multimodal Language Model for Driving PDF

[63] A Review on LLMs for IoT Ecosystem: State-of-the-art, Lightweight Models, Use Cases, Key Challenges, Future Directions PDF

[64] Question-instructed visual descriptions for zero-shot video answering PDF

[65] MoDE-CoTD: Chain-of-Thought Distillation for Complex Reasoning Tasks with Mixture of Decoupled LoRA-Experts PDF

[66] Parametric layer erasure through latent semantic oscillation in instruction-tuned language models PDF

[67] Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous Driving and Zero-Shot Instruction Following PDF

[68] Particle Swarm Optimization Meets Large Language Models PDF

Reasoning-aware null-space projection technique

[69] AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models PDF

[70] MINGLE: Mixture of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging PDF

[71] MINGLE: Mixtures of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging PDF

[72] Mitigating negative interference in multilingual knowledge editing through null-space constraints PDF

[73] PurifyGen: A Risk-Discrimination and Semantic-Purification Model for Safe Text-to-Image Generation PDF

[74] NP-LoRA: Null Space Projection Unifies Subject and Style in LoRA Fusion PDF

[75] CaseEdit: Enhancing Localized Commonsense Reasoning via Null-Space Constrained Knowledge Editing in Small Parameter Language Models PDF

[76] Null-Space Filtering for Data-Free Continual Model Merging: Preserving Transparency, Promoting Fidelity PDF

[77] LoRA-Null: Low-Rank Adaptation via Null Space for Large Language Models PDF

Instruction-attention guided merging coefficients

[51] Multimodal Attention-Based Instruction-Following Part-Level Affordance Grounding PDF

[52] Focus on your instruction: Fine-grained and multi-instruction image editing by attention modulation PDF

[53] Spotlight Your Instructions: Instruction-following with Dynamic Attention Steering PDF

[54] On the loss of context-awareness in general instruction fine-tuning PDF

[55] Scale Your Instructions: Enhance the Instruction-Following Fidelity of Unified Image Generation Model by Self-Adaptive Attention Scaling PDF

[56] Hierarchical neuro-symbolic context morphogenesis for large language model reasoning fidelity PDF

[57] A Comprehensive Survey on Continual Learning in Generative Models PDF

[58] Instruction Following by Boosting Attention of Large Language Models PDF

[59] Hunyuan-turbos: Advancing large language models through mamba-transformer synergy and adaptive chain-of-thought PDF

[60] Ivtp: Instruction-guided visual token pruning for large vision-language models PDF

Table of Contents