ProxyThinker: Test-Time Guidance through Small Visual Reasoners

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

decoding-time algorithmsvisual reasoning

Recent advancements in reinforcement learning with verifiable rewards have pushed the boundaries of the visual reasoning capabilities in large vision-language models (LVLMs). However, training LVLMs with reinforcement fine-tuning (RFT) is computationally expensive, posing a significant challenge to scaling model size. In this work, we propose ProxyThinker, an inference-time technique that enables large models to inherit the visual reasoning capabilities from small, slow-thinking visual reasoners without any training. By subtracting the output distributions of base models from those of RFT reasoners, ProxyThinker modifies the decoding dynamics and successfully elicits the slow-thinking reasoning demonstrated by the emerged sophisticated behaviors such as self-verification and self-correction. ProxyThinker consistently boosts performance on challenging visual benchmarks on spatial, mathematical, and multidisciplinary reasoning, enabling untuned base models to compete with the performance of their full-scale RFT counterparts. Furthermore, our implementation efficiently coordinates multiple language models with parallelism techniques and achieves faster inference compared to previous decoding-time methods, paving the way for the practical deployment of ProxyThinker. Code is available at https://anonymous.4open.science/r/ProxyThinker-FAAF.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes ProxyThinker, an inference-time technique that transfers visual reasoning capabilities from small reinforcement-fine-tuned models to large base models by manipulating output distributions during decoding. It resides in the Distribution-Based Inference-Time Guidance leaf, which currently contains only this paper as a sibling. This positioning suggests the work occupies a relatively sparse research direction within the broader taxonomy of test-time transfer methods, distinguishing itself from the more populated knowledge distillation and fine-tuning branches that dominate the field.

The taxonomy reveals neighboring approaches in Test-Time Inference Guidance, though no other papers share the exact distribution-based mechanism. Adjacent branches include Knowledge Distillation methods that require training phases and Reasoning Transfer techniques that generate synthetic data for fine-tuning. ProxyThinker diverges from these by avoiding model retraining entirely, instead leveraging runtime distribution subtraction. The scope boundaries clarify that training-based transfer methods belong elsewhere, positioning this work as a pure inference-time intervention that contrasts with the distillation-heavy approaches found in sibling branches like Cross-Modal Knowledge Distillation and Chain-of-Thought Reasoning Transfer.

Among 29 candidates examined across three contributions, the logit-delta steering mechanism shows overlap with one prior work, while the core ProxyThinker technique and parallelism implementation appear more novel within the limited search scope. Specifically, the logit-delta contribution examined 9 candidates with 1 refutable match, suggesting some precedent for distribution manipulation strategies. The other two contributions each examined 10 candidates with no clear refutations, indicating less direct prior work among the top semantic matches. These statistics reflect a focused literature search rather than exhaustive coverage, leaving open the possibility of additional related work beyond the top-K results.

Based on the limited search scope of 29 candidates, the work appears to introduce a relatively fresh approach to test-time reasoning transfer, particularly in its application to vision-language models. The sparse taxonomy leaf and low refutation rate suggest novelty, though the single overlap on logit-delta steering indicates the underlying distribution manipulation concept has precedent. The analysis covers top semantic matches and does not claim exhaustive field coverage.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Test-time transfer of visual reasoning capabilities from small to large models. The field addresses how to leverage compact models to enhance the reasoning performance of larger vision-language systems without retraining. The taxonomy reveals four main branches: Test-Time Inference Guidance and Decoding Methods focus on runtime mechanisms that steer large model outputs using signals from smaller models, often through distribution alignment or process-level guidance (e.g., Virgo[3], Vision Language Process Rewards[4]). Knowledge Distillation for Vision-Language Transfer encompasses techniques that compress reasoning patterns from teachers to students, including compositional and modality-balanced approaches (e.g., CompoDistill[9], Mitigate Modality Imbalance[7]). Reasoning Transfer via Fine-Tuning and Data Generation explores how synthetic data or intermediate reasoning steps can be generated by small models to improve larger ones (e.g., Step Back Visual Reasoning[10], Symbolic to Object Embeddings[11]). Generalization and Robustness in Visual Reasoning examines how transferred capabilities hold up across diverse tasks and domains, addressing questions of transferability and compositional understanding. A central tension across these branches is whether to intervene at inference time versus training time, and whether to rely on explicit symbolic reasoning or implicit distributional guidance. ProxyThinker[0] sits within the Distribution-Based Inference-Time Guidance cluster, emphasizing runtime transfer without model updates. This contrasts with distillation-focused works like Reasoning Teachers[5] or Pretraining Knowledge Distillation[8], which bake reasoning patterns into model weights during training. Compared to process-reward methods such as Vision Language Process Rewards[4], ProxyThinker[0] appears to leverage distributional alignment from a smaller proxy model to guide the larger model's decoding, offering a lightweight alternative that avoids the overhead of reward modeling or fine-tuning. The broader challenge remains how to balance the efficiency gains of small-model guidance with the need for robust generalization across varied visual reasoning tasks.

Claimed Contributions

ProxyThinker inference-time technique for visual reasoning transfer

10 retrieved papers

The authors introduce ProxyThinker, a training-free decoding method that transfers visual reasoning abilities from small RFT-trained models to large base models by subtracting amateur model logits from expert model logits during inference, enabling slow-thinking reasoning behaviors without parameter updates.

10 retrieved papers

Efficient multi-model implementation with parallelism techniques

10 retrieved papers

The authors develop an optimized implementation on vLLM that leverages tensor parallelism and asynchronous execution across multiple models, achieving 38× speedup over previous decoding-time steering methods while minimizing GPU idle time.

10 retrieved papers

Logit-delta steering mechanism for reasoning behavior transfer

Can Refute

9 retrieved papers

The method modifies next-token prediction by adding a scaled difference between expert and amateur model logits to the base model logits, successfully transferring sophisticated reasoning behaviors such as self-verification and self-correction from small to large models.

9 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

ProxyThinker inference-time technique for visual reasoning transfer

[7] Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs? PDF

Cannot Refute

[16] Investigating inference-time scaling for chain of multi-modal thought: A preliminary study PDF

Cannot Refute

[17] Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time PDF

Cannot Refute

[18] Chain-of-Sketch: Enabling Global Visual Reasoning PDF

Cannot Refute

[19] Beyond Embeddings: The Promise of Visual Table in Visual Reasoning PDF

Cannot Refute

[20] Big Reasoning with Small Models: Instruction Retrieval at Inference Time PDF

Cannot Refute

[21] Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning PDF

Cannot Refute

[22] Aurelia: Test-time reasoning distillation in audio-visual llms PDF

Cannot Refute

[23] TT-MPD: Test Time Model Pruning and Distillation PDF

Cannot Refute

[24] Beyond answers: Transferring reasoning capabilities to smaller llms using multi-teacher knowledge distillation PDF

Cannot Refute

Contribution

Efficient multi-model implementation with parallelism techniques

[34] Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads PDF

Cannot Refute

[35] Accelerating Large Language Model Decoding with Speculative Sampling PDF

Cannot Refute

[36] Fast Inference from Transformers via Speculative Decoding PDF

Cannot Refute

[37] Parallel Secure Inference for Multiple Models Based on CKKS PDF

Cannot Refute

[38] Falcon: Faster and parallel inference of large language models through enhanced semi-autoregressive drafting and custom-designed decoding tree PDF

Cannot Refute

[39] Easyspec: Layer-parallel speculative decoding for efficient multi-gpu utilization PDF

Cannot Refute

[40] Hogwild! inference: Parallel llm generation via concurrent attention PDF

Cannot Refute

[41] Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding PDF

Cannot Refute

[42] A survey on parallel reasoning PDF

Cannot Refute

[43] Pearl: Parallel speculative decoding with adaptive draft length PDF

Cannot Refute

Contribution

Logit-delta steering mechanism for reasoning behavior transfer

[31] Transferable Post-training via Inverse Value Learning PDF

Can Refute

[25] Infifusion: A unified framework for enhanced cross-model reasoning via llm fusion PDF

Cannot Refute

[26] A comprehensive survey of hallucination mitigation techniques in large language models PDF

Cannot Refute

[27] Logit arithmetic elicits long reasoning capabilities without training PDF

Cannot Refute

[28] Hybrid Knowledge Transfer through Attention and Logit Distillation for On-Device Vision Systems in Agricultural IoT PDF

Cannot Refute

[29] Cot2align: Cross-chain of thought distillation via optimal transport alignment for language models with different tokenizers PDF

Cannot Refute

[30] Hyper-modal Imputation Diffusion Embedding with Dual-Distillation for Federated Multimodal Knowledge Graph Completion PDF

Cannot Refute

[32] Transfer and updating of Logit models of gap-acceptance and their operational implications PDF

Cannot Refute

[33] Activation Steering in Generative Settings via Contrastive Causal Mediation Analysis PDF

Cannot Refute

ProxyThinker: Test-Time Guidance through Small Visual Reasoners

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

ProxyThinker inference-time technique for visual reasoning transfer

[7] Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs? PDF

[16] Investigating inference-time scaling for chain of multi-modal thought: A preliminary study PDF

[17] Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time PDF

[18] Chain-of-Sketch: Enabling Global Visual Reasoning PDF

[19] Beyond Embeddings: The Promise of Visual Table in Visual Reasoning PDF

[20] Big Reasoning with Small Models: Instruction Retrieval at Inference Time PDF

[21] Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning PDF

[22] Aurelia: Test-time reasoning distillation in audio-visual llms PDF

[23] TT-MPD: Test Time Model Pruning and Distillation PDF

[24] Beyond answers: Transferring reasoning capabilities to smaller llms using multi-teacher knowledge distillation PDF

Efficient multi-model implementation with parallelism techniques

[34] Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads PDF

[35] Accelerating Large Language Model Decoding with Speculative Sampling PDF

[36] Fast Inference from Transformers via Speculative Decoding PDF

[37] Parallel Secure Inference for Multiple Models Based on CKKS PDF

[38] Falcon: Faster and parallel inference of large language models through enhanced semi-autoregressive drafting and custom-designed decoding tree PDF

[39] Easyspec: Layer-parallel speculative decoding for efficient multi-gpu utilization PDF

[40] Hogwild! inference: Parallel llm generation via concurrent attention PDF

[41] Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding PDF

[42] A survey on parallel reasoning PDF

[43] Pearl: Parallel speculative decoding with adaptive draft length PDF

Logit-delta steering mechanism for reasoning behavior transfer

[31] Transferable Post-training via Inverse Value Learning PDF

[25] Infifusion: A unified framework for enhanced cross-model reasoning via llm fusion PDF

[26] A comprehensive survey of hallucination mitigation techniques in large language models PDF

[27] Logit arithmetic elicits long reasoning capabilities without training PDF

[28] Hybrid Knowledge Transfer through Attention and Logit Distillation for On-Device Vision Systems in Agricultural IoT PDF

[29] Cot2align: Cross-chain of thought distillation via optimal transport alignment for language models with different tokenizers PDF

[30] Hyper-modal Imputation Diffusion Embedding with Dual-Distillation for Federated Multimodal Knowledge Graph Completion PDF

[32] Transfer and updating of Logit models of gap-acceptance and their operational implications PDF

[33] Activation Steering in Generative Settings via Contrastive Causal Mediation Analysis PDF

Table of Contents