ProxyThinker: Test-Time Guidance through Small Visual Reasoners
Overview
Overall Novelty Assessment
The paper proposes ProxyThinker, an inference-time technique that transfers visual reasoning capabilities from small reinforcement-fine-tuned models to large base models by manipulating output distributions during decoding. It resides in the Distribution-Based Inference-Time Guidance leaf, which currently contains only this paper as a sibling. This positioning suggests the work occupies a relatively sparse research direction within the broader taxonomy of test-time transfer methods, distinguishing itself from the more populated knowledge distillation and fine-tuning branches that dominate the field.
The taxonomy reveals neighboring approaches in Test-Time Inference Guidance, though no other papers share the exact distribution-based mechanism. Adjacent branches include Knowledge Distillation methods that require training phases and Reasoning Transfer techniques that generate synthetic data for fine-tuning. ProxyThinker diverges from these by avoiding model retraining entirely, instead leveraging runtime distribution subtraction. The scope boundaries clarify that training-based transfer methods belong elsewhere, positioning this work as a pure inference-time intervention that contrasts with the distillation-heavy approaches found in sibling branches like Cross-Modal Knowledge Distillation and Chain-of-Thought Reasoning Transfer.
Among 29 candidates examined across three contributions, the logit-delta steering mechanism shows overlap with one prior work, while the core ProxyThinker technique and parallelism implementation appear more novel within the limited search scope. Specifically, the logit-delta contribution examined 9 candidates with 1 refutable match, suggesting some precedent for distribution manipulation strategies. The other two contributions each examined 10 candidates with no clear refutations, indicating less direct prior work among the top semantic matches. These statistics reflect a focused literature search rather than exhaustive coverage, leaving open the possibility of additional related work beyond the top-K results.
Based on the limited search scope of 29 candidates, the work appears to introduce a relatively fresh approach to test-time reasoning transfer, particularly in its application to vision-language models. The sparse taxonomy leaf and low refutation rate suggest novelty, though the single overlap on logit-delta steering indicates the underlying distribution manipulation concept has precedent. The analysis covers top semantic matches and does not claim exhaustive field coverage.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors introduce ProxyThinker, a training-free decoding method that transfers visual reasoning abilities from small RFT-trained models to large base models by subtracting amateur model logits from expert model logits during inference, enabling slow-thinking reasoning behaviors without parameter updates.
The authors develop an optimized implementation on vLLM that leverages tensor parallelism and asynchronous execution across multiple models, achieving 38× speedup over previous decoding-time steering methods while minimizing GPU idle time.
The method modifies next-token prediction by adding a scaled difference between expert and amateur model logits to the base model logits, successfully transferring sophisticated reasoning behaviors such as self-verification and self-correction from small to large models.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
Contribution Analysis
Detailed comparisons for each claimed contribution
ProxyThinker inference-time technique for visual reasoning transfer
The authors introduce ProxyThinker, a training-free decoding method that transfers visual reasoning abilities from small RFT-trained models to large base models by subtracting amateur model logits from expert model logits during inference, enabling slow-thinking reasoning behaviors without parameter updates.
[7] Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs? PDF
[16] Investigating inference-time scaling for chain of multi-modal thought: A preliminary study PDF
[17] Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time PDF
[18] Chain-of-Sketch: Enabling Global Visual Reasoning PDF
[19] Beyond Embeddings: The Promise of Visual Table in Visual Reasoning PDF
[20] Big Reasoning with Small Models: Instruction Retrieval at Inference Time PDF
[21] Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning PDF
[22] Aurelia: Test-time reasoning distillation in audio-visual llms PDF
[23] TT-MPD: Test Time Model Pruning and Distillation PDF
[24] Beyond answers: Transferring reasoning capabilities to smaller llms using multi-teacher knowledge distillation PDF
Efficient multi-model implementation with parallelism techniques
The authors develop an optimized implementation on vLLM that leverages tensor parallelism and asynchronous execution across multiple models, achieving 38× speedup over previous decoding-time steering methods while minimizing GPU idle time.
[34] Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads PDF
[35] Accelerating Large Language Model Decoding with Speculative Sampling PDF
[36] Fast Inference from Transformers via Speculative Decoding PDF
[37] Parallel Secure Inference for Multiple Models Based on CKKS PDF
[38] Falcon: Faster and parallel inference of large language models through enhanced semi-autoregressive drafting and custom-designed decoding tree PDF
[39] Easyspec: Layer-parallel speculative decoding for efficient multi-gpu utilization PDF
[40] Hogwild! inference: Parallel llm generation via concurrent attention PDF
[41] Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding PDF
[42] A survey on parallel reasoning PDF
[43] Pearl: Parallel speculative decoding with adaptive draft length PDF
Logit-delta steering mechanism for reasoning behavior transfer
The method modifies next-token prediction by adding a scaled difference between expert and amateur model logits to the base model logits, successfully transferring sophisticated reasoning behaviors such as self-verification and self-correction from small to large models.