TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Vector QuantizationKV Cache CompressionNearest Neighbor SearchSimilarity Search AccelerationOnline Compression Algorithms

Vector quantization, a problem rooted in Shannon's source coding theory, aims to quantize high-dimensional Euclidean vectors while minimizing distortion in their geometric structure. We propose TurboQuant to address both mean-squared error (MSE) and inner product distortion, overcoming limitations of existing methods that fail to achieve optimal distortion rates. Our data-oblivious algorithms, suitable for online applications, achieve near-optimal distortion rates (within a small constant factor) across all bit-widths and dimensions. TurboQuant achieves this by randomly rotating input vectors, inducing a concentrated Beta distribution on coordinates, and leveraging the near-independence property of distinct coordinates in high dimensions to simply apply optimal scalar quantizers per each coordinate. Recognizing that MSE-optimal quantizers introduce bias in inner product estimation, we propose a two-stage approach: applying an MSE quantizer followed by a 1-bit Quantized JL (QJL) transform on the residual, resulting in an unbiased inner product quantizer. We also provide a formal proof of the information-theoretic lower bounds on best achievable distortion rate by any vector quantizer, demonstrating that TurboQuant closely matches these bounds, differing only by a small constant ( $\approx 2.7$ ) factor. Experimental results validate our theoretical findings, showing that for KV cache quantization, we achieve absolute quality neutrality with 3.5 bits per channel and marginal quality degradation with 2.5 bits per channel. Furthermore, in nearest neighbor search tasks, our method outperforms existing product quantization techniques in recall while reducing indexing time to virtually zero.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes TurboQuant, a data-oblivious vector quantization algorithm targeting near-optimal distortion rates for both mean-squared error and inner product preservation across all bit-widths and dimensions. It resides in the 'Rate-Distortion Theory and Asymptotic Limits' leaf, which contains five papers including the original work. This leaf sits within the broader 'Theoretical Foundations and Asymptotic Analysis' branch, indicating a focus on fundamental limits rather than application-specific heuristics. The taxonomy reveals this is a moderately populated research direction, with sibling papers examining asymptotic quantization error and rate-distortion tradeoffs, suggesting established but not overcrowded theoretical terrain.

The taxonomy tree shows neighboring leaves addressing lattice-based quantization theory and dithering techniques, both within the same theoretical foundations branch. Adjacent branches explore product quantization methods, residual hierarchical approaches, and randomized projection-based techniques. TurboQuant's use of random rotations and coordinate-wise scalar quantization connects it to rotation-based methods in the randomized quantization branch, yet its emphasis on provable distortion bounds and information-theoretic lower bounds firmly anchors it in the theoretical foundations category. The scope notes clarify that algorithmic design and empirical methods belong elsewhere, reinforcing this work's theoretical positioning.

Among twenty-one candidates examined, the contribution-level analysis reveals varied novelty profiles. The core TurboQuant algorithm examined ten candidates with zero refutations, suggesting limited direct overlap in the algorithmic approach within this search scope. The two-stage unbiased inner product quantizer examined only one candidate without refutation, indicating sparse prior work on this specific technique among the papers retrieved. However, the information-theoretic lower bounds contribution examined ten candidates and found one refutable match, implying that among the limited set reviewed, at least one prior work addresses similar theoretical characterizations of fundamental distortion limits.

Based on the top-twenty-one semantic matches examined, the algorithmic contributions appear relatively distinct within this search scope, while the theoretical lower bound analysis encounters some overlap. The taxonomy structure suggests the paper occupies a well-defined niche at the intersection of asymptotic theory and practical algorithm design, though the limited search scale means broader literature may contain additional relevant prior work not captured here. The analysis covers foundational theoretical papers and recent algorithmic innovations but does not claim exhaustive coverage of all vector quantization research.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Vector quantization with minimal distortion in high-dimensional Euclidean spaces. The field is organized around several complementary perspectives. Theoretical Foundations and Asymptotic Analysis examines fundamental limits and rate-distortion trade-offs, often drawing on classical results like Asymptotic Quantization Error[12] and recent extensions in Quantization Asymptotics[28]. Product Quantization and Subspace Decomposition Methods partition high-dimensional spaces into manageable subspaces, enabling scalable codebook design through works such as Locally Optimized PQ[15] and Optimized Cartesian[25]. Residual and Hierarchical Quantization refines representations iteratively, as seen in Improved Residual VQ[9] and Hierarchical VQ[49]. Randomized and Projection-Based Quantization leverages dimensionality reduction and stochastic techniques, exemplified by Fast JL Transform[31] and Quantized Random Embeddings[46]. Application-Specific Quantization Methods tailor distortion measures to domains like speech (DNN Speech Spectra[17]) or medical signals (ECG Quantization[35]), while Algorithmic Optimization focuses on computational efficiency through methods like LBG Algorithm[27]. Specialized Distortion Measures address non-Euclidean metrics, including Angular Quantization[20] and Kernel Distortion[32]. Recent efforts balance theoretical rigor with practical deployment. A handful of works explore asymptotic limits and optimal lattice structures (Optimal Lattice VQ[16], Uniform Sphere Distribution[44]), seeking provable guarantees on distortion as dimensionality grows. Meanwhile, many studies pursue algorithmic innovations that reduce computational overhead without sacrificing accuracy, such as Expand and Quantize[4], GPTVQ[6], and RaBitQ[7]. TurboQuant[0] sits within the Theoretical Foundations branch, specifically addressing rate-distortion theory and asymptotic limits. Compared to neighboring works like Asymptotic Quantization Error[12] and Quantization Asymptotics[28], TurboQuant[0] emphasizes achieving minimal distortion guarantees in high-dimensional regimes, contributing fresh perspectives on how quantization error scales with dimension and codebook size. This contrasts with more application-driven approaches like Practical Optimal Quantization[1] or High Dimensional VQ[5], which prioritize empirical performance over asymptotic characterization.

Claimed Contributions

TurboQuant algorithm for near-optimal vector quantization

10 retrieved papers

The authors introduce TurboQuant, a data-oblivious vector quantization algorithm that achieves near-optimal distortion rates for both MSE and inner product metrics across all bit-widths and dimensions. The method randomly rotates input vectors to induce a concentrated Beta distribution on coordinates, then applies optimal scalar quantizers per coordinate.

10 retrieved papers

Two-stage approach for unbiased inner product quantization

1 retrieved paper

The authors develop a two-stage quantization method that first applies an MSE-optimal quantizer and then uses a 1-bit QJL transform on the residual error. This design addresses the bias inherent in MSE-optimal quantizers for inner product estimation, producing an unbiased estimator.

1 retrieved paper

Information-theoretic lower bounds on vector quantization distortion

Can Refute

10 retrieved papers

The authors establish formal information-theoretic lower bounds on the distortion achievable by any vector quantizer and prove that TurboQuant approaches these bounds within a small constant factor (approximately 2.7), demonstrating near-optimality.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[12] Asymptotic quantization error of continuous signals and the quantization dimension PDF

P. ZÃ¡dor, P. Zador (1982)

[19] Quantization PDF

Eric A. Galapon (2002)

[28] Asymptotics of the quantization problem on metric measure spaces PDF

Aydin, Ata Deniz (2025)

[44] Uniform Distribution on ()-Sphere: Rate-Distortion Under Squared Error Distortion PDF

A Dytso, M Cardone (2024)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

TurboQuant algorithm for near-optimal vector quantization

[55] Vector quantization and signal compression PDF

Cannot Refute

[60] Transform quantization for CNN compression PDF

Cannot Refute

[61] On optimal MMSE channel estimation for one-bit quantized MIMO systems PDF

Cannot Refute

[62] Optimized product quantization for approximate nearest neighbor search PDF

Cannot Refute

[63] On the mean square error optimal estimator in one-bit quantized systems PDF

Cannot Refute

[64] Entropy-Optimized Deep Weighted Product Quantization for Image Retrieval PDF

Cannot Refute

[65] Rate-Distortion Optimized Post-Training Quantization for Learned Image Compression PDF

Cannot Refute

[66] Rate-Distortion Optimization for Adaptive Gradient Quantization in Federated Learning PDF

Cannot Refute

[67] Asymptotically Optimal Joint Sampling and Compression for Timely Status Updates: AgeâDistortion Tradeoff PDF

Cannot Refute

[68] Scanline-based fast algorithm and pipelined hardware design of rate-distortion optimized quantization for AVS3 PDF

Cannot Refute

Contribution

Two-stage approach for unbiased inner product quantization

[51] Low bit-rate subband coding of image and video signals using vector quantization PDF

Cannot Refute

Contribution

Information-theoretic lower bounds on vector quantization distortion

[58] Vector quantization in speech coding PDF

Can Refute

[7] RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search PDF

Cannot Refute

[8] Approaching Rate-Distortion Limits in Neural Compression with Lattice Transform Coding PDF

Cannot Refute

[52] Rejection-sampled universal quantization for smaller quantization errors PDF

Cannot Refute

[53] Performance bounds for vector quantized compressive sensing PDF

Cannot Refute

[54] Adversarial Defenses via Vector Quantization PDF

Cannot Refute

[55] Vector quantization and signal compression PDF

Cannot Refute

[56] Entropy-Constrained VQ-VAE for Deep-Learning-Based CSI Feedback PDF

Cannot Refute

[57] Optimal Quantization for Matrix Multiplication PDF

Cannot Refute

[59] Entropy-constrained vector quantization PDF

Cannot Refute

TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[12] Asymptotic quantization error of continuous signals and the quantization dimension PDF

[19] Quantization PDF

[28] Asymptotics of the quantization problem on metric measure spaces PDF

[44] Uniform Distribution on ()-Sphere: Rate-Distortion Under Squared Error Distortion PDF

Contribution Analysis

TurboQuant algorithm for near-optimal vector quantization

[55] Vector quantization and signal compression PDF

[60] Transform quantization for CNN compression PDF

[61] On optimal MMSE channel estimation for one-bit quantized MIMO systems PDF

[62] Optimized product quantization for approximate nearest neighbor search PDF

[63] On the mean square error optimal estimator in one-bit quantized systems PDF

[64] Entropy-Optimized Deep Weighted Product Quantization for Image Retrieval PDF

[65] Rate-Distortion Optimized Post-Training Quantization for Learned Image Compression PDF

[66] Rate-Distortion Optimization for Adaptive Gradient Quantization in Federated Learning PDF

[67] Asymptotically Optimal Joint Sampling and Compression for Timely Status Updates: AgeâDistortion Tradeoff PDF

[68] Scanline-based fast algorithm and pipelined hardware design of rate-distortion optimized quantization for AVS3 PDF

Two-stage approach for unbiased inner product quantization

[51] Low bit-rate subband coding of image and video signals using vector quantization PDF

Information-theoretic lower bounds on vector quantization distortion

[58] Vector quantization in speech coding PDF

[7] RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search PDF

[8] Approaching Rate-Distortion Limits in Neural Compression with Lattice Transform Coding PDF

[52] Rejection-sampled universal quantization for smaller quantization errors PDF

[53] Performance bounds for vector quantized compressive sensing PDF

[54] Adversarial Defenses via Vector Quantization PDF

[55] Vector quantization and signal compression PDF

[56] Entropy-Constrained VQ-VAE for Deep-Learning-Based CSI Feedback PDF

[57] Optimal Quantization for Matrix Multiplication PDF

[59] Entropy-constrained vector quantization PDF

Table of Contents

[67] Asymptotically Optimal Joint Sampling and Compression for Timely Status Updates: AgeâDistortion Tradeoff PDF