Global Resolution: Optimal Multi-Draft Speculative Sampling via Convex Optimization
Overview
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors prove that the canonical decomposition approach by Khisti et al. (2025) and the subset selection formulation by Hu et al. (2025) are mathematically equivalent to solving a relaxed optimal transport linear program. This unifies two recent theoretical frameworks for multi-draft speculative sampling.
The authors reformulate the optimal transport linear program as a max-flow problem and then apply polymatroid theory to reduce it to a tractable convex optimization problem with at most V variables, enabling efficient computation of the optimal verification criterion.
The authors develop the global resolution algorithm that computes the optimal transport solution for i.i.d. multi-draft speculative sampling to arbitrary accuracy. The algorithm achieves over 90% acceptance rates with under 100 ms overhead per token, representing the first practical multi-draft method with such performance.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
Contribution Analysis
Detailed comparisons for each claimed contribution
Equivalence of canonical decomposition and relaxed OTLP
The authors prove that the canonical decomposition approach by Khisti et al. (2025) and the subset selection formulation by Hu et al. (2025) are mathematically equivalent to solving a relaxed optimal transport linear program. This unifies two recent theoretical frameworks for multi-draft speculative sampling.
Max-flow and convex optimization reduction of OTLP
The authors reformulate the optimal transport linear program as a max-flow problem and then apply polymatroid theory to reduce it to a tractable convex optimization problem with at most V variables, enabling efficient computation of the optimal verification criterion.
Global resolution algorithm for optimal multi-draft speculative sampling
The authors develop the global resolution algorithm that computes the optimal transport solution for i.i.d. multi-draft speculative sampling to arbitrary accuracy. The algorithm achieves over 90% acceptance rates with under 100 ms overhead per token, representing the first practical multi-draft method with such performance.