TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

ICLR 2026 Conference SubmissionAnonymous Authors
Vector QuantizationKV Cache CompressionNearest Neighbor SearchSimilarity Search AccelerationOnline Compression Algorithms
Abstract:

Vector quantization, a problem rooted in Shannon's source coding theory, aims to quantize high-dimensional Euclidean vectors while minimizing distortion in their geometric structure. We propose TurboQuant to address both mean-squared error (MSE) and inner product distortion, overcoming limitations of existing methods that fail to achieve optimal distortion rates. Our data-oblivious algorithms, suitable for online applications, achieve near-optimal distortion rates (within a small constant factor) across all bit-widths and dimensions. TurboQuant achieves this by randomly rotating input vectors, inducing a concentrated Beta distribution on coordinates, and leveraging the near-independence property of distinct coordinates in high dimensions to simply apply optimal scalar quantizers per each coordinate. Recognizing that MSE-optimal quantizers introduce bias in inner product estimation, we propose a two-stage approach: applying an MSE quantizer followed by a 1-bit Quantized JL (QJL) transform on the residual, resulting in an unbiased inner product quantizer. We also provide a formal proof of the information-theoretic lower bounds on best achievable distortion rate by any vector quantizer, demonstrating that TurboQuant closely matches these bounds, differing only by a small constant (2.7\approx 2.7) factor. Experimental results validate our theoretical findings, showing that for KV cache quantization, we achieve absolute quality neutrality with 3.5 bits per channel and marginal quality degradation with 2.5 bits per channel. Furthermore, in nearest neighbor search tasks, our method outperforms existing product quantization techniques in recall while reducing indexing time to virtually zero.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes TurboQuant, a data-oblivious vector quantization algorithm targeting near-optimal distortion rates for both mean-squared error and inner product preservation across all bit-widths and dimensions. It resides in the 'Rate-Distortion Theory and Asymptotic Limits' leaf, which contains five papers including the original work. This leaf sits within the broader 'Theoretical Foundations and Asymptotic Analysis' branch, indicating a focus on fundamental limits rather than application-specific heuristics. The taxonomy reveals this is a moderately populated research direction, with sibling papers examining asymptotic quantization error and rate-distortion tradeoffs, suggesting established but not overcrowded theoretical terrain.

The taxonomy tree shows neighboring leaves addressing lattice-based quantization theory and dithering techniques, both within the same theoretical foundations branch. Adjacent branches explore product quantization methods, residual hierarchical approaches, and randomized projection-based techniques. TurboQuant's use of random rotations and coordinate-wise scalar quantization connects it to rotation-based methods in the randomized quantization branch, yet its emphasis on provable distortion bounds and information-theoretic lower bounds firmly anchors it in the theoretical foundations category. The scope notes clarify that algorithmic design and empirical methods belong elsewhere, reinforcing this work's theoretical positioning.

Among twenty-one candidates examined, the contribution-level analysis reveals varied novelty profiles. The core TurboQuant algorithm examined ten candidates with zero refutations, suggesting limited direct overlap in the algorithmic approach within this search scope. The two-stage unbiased inner product quantizer examined only one candidate without refutation, indicating sparse prior work on this specific technique among the papers retrieved. However, the information-theoretic lower bounds contribution examined ten candidates and found one refutable match, implying that among the limited set reviewed, at least one prior work addresses similar theoretical characterizations of fundamental distortion limits.

Based on the top-twenty-one semantic matches examined, the algorithmic contributions appear relatively distinct within this search scope, while the theoretical lower bound analysis encounters some overlap. The taxonomy structure suggests the paper occupies a well-defined niche at the intersection of asymptotic theory and practical algorithm design, though the limited search scale means broader literature may contain additional relevant prior work not captured here. The analysis covers foundational theoretical papers and recent algorithmic innovations but does not claim exhaustive coverage of all vector quantization research.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
21
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: Vector quantization with minimal distortion in high-dimensional Euclidean spaces. The field is organized around several complementary perspectives. Theoretical Foundations and Asymptotic Analysis examines fundamental limits and rate-distortion trade-offs, often drawing on classical results like Asymptotic Quantization Error[12] and recent extensions in Quantization Asymptotics[28]. Product Quantization and Subspace Decomposition Methods partition high-dimensional spaces into manageable subspaces, enabling scalable codebook design through works such as Locally Optimized PQ[15] and Optimized Cartesian[25]. Residual and Hierarchical Quantization refines representations iteratively, as seen in Improved Residual VQ[9] and Hierarchical VQ[49]. Randomized and Projection-Based Quantization leverages dimensionality reduction and stochastic techniques, exemplified by Fast JL Transform[31] and Quantized Random Embeddings[46]. Application-Specific Quantization Methods tailor distortion measures to domains like speech (DNN Speech Spectra[17]) or medical signals (ECG Quantization[35]), while Algorithmic Optimization focuses on computational efficiency through methods like LBG Algorithm[27]. Specialized Distortion Measures address non-Euclidean metrics, including Angular Quantization[20] and Kernel Distortion[32]. Recent efforts balance theoretical rigor with practical deployment. A handful of works explore asymptotic limits and optimal lattice structures (Optimal Lattice VQ[16], Uniform Sphere Distribution[44]), seeking provable guarantees on distortion as dimensionality grows. Meanwhile, many studies pursue algorithmic innovations that reduce computational overhead without sacrificing accuracy, such as Expand and Quantize[4], GPTVQ[6], and RaBitQ[7]. TurboQuant[0] sits within the Theoretical Foundations branch, specifically addressing rate-distortion theory and asymptotic limits. Compared to neighboring works like Asymptotic Quantization Error[12] and Quantization Asymptotics[28], TurboQuant[0] emphasizes achieving minimal distortion guarantees in high-dimensional regimes, contributing fresh perspectives on how quantization error scales with dimension and codebook size. This contrasts with more application-driven approaches like Practical Optimal Quantization[1] or High Dimensional VQ[5], which prioritize empirical performance over asymptotic characterization.

Claimed Contributions

TurboQuant algorithm for near-optimal vector quantization

The authors introduce TurboQuant, a data-oblivious vector quantization algorithm that achieves near-optimal distortion rates for both MSE and inner product metrics across all bit-widths and dimensions. The method randomly rotates input vectors to induce a concentrated Beta distribution on coordinates, then applies optimal scalar quantizers per coordinate.

10 retrieved papers
Two-stage approach for unbiased inner product quantization

The authors develop a two-stage quantization method that first applies an MSE-optimal quantizer and then uses a 1-bit QJL transform on the residual error. This design addresses the bias inherent in MSE-optimal quantizers for inner product estimation, producing an unbiased estimator.

1 retrieved paper
Information-theoretic lower bounds on vector quantization distortion

The authors establish formal information-theoretic lower bounds on the distortion achievable by any vector quantizer and prove that TurboQuant approaches these bounds within a small constant factor (approximately 2.7), demonstrating near-optimality.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

TurboQuant algorithm for near-optimal vector quantization

The authors introduce TurboQuant, a data-oblivious vector quantization algorithm that achieves near-optimal distortion rates for both MSE and inner product metrics across all bit-widths and dimensions. The method randomly rotates input vectors to induce a concentrated Beta distribution on coordinates, then applies optimal scalar quantizers per coordinate.

Contribution

Two-stage approach for unbiased inner product quantization

The authors develop a two-stage quantization method that first applies an MSE-optimal quantizer and then uses a 1-bit QJL transform on the residual error. This design addresses the bias inherent in MSE-optimal quantizers for inner product estimation, producing an unbiased estimator.

Contribution

Information-theoretic lower bounds on vector quantization distortion

The authors establish formal information-theoretic lower bounds on the distortion achievable by any vector quantizer and prove that TurboQuant approaches these bounds within a small constant factor (approximately 2.7), demonstrating near-optimality.