QVGen: Pushing the Limit of Quantized Video Generative Models
Overview
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors present QVGen, the first quantization-aware training framework specifically designed for video diffusion models. It enables effective 3-bit and 4-bit quantization while achieving full-precision comparable quality, addressing the challenge that existing QAT methods fail to handle video generation tasks under extremely low-bit settings.
The authors introduce learnable auxiliary modules that mitigate quantization errors during training. Through theoretical analysis demonstrating that reducing gradient norm is essential for QAT convergence, these modules stabilize the training process and significantly enhance convergence for extremely low-bit quantization.
The authors develop a rank-decay strategy that progressively removes auxiliary modules during training to eliminate inference overhead. This strategy repeatedly applies singular value decomposition and rank-based regularization to identify and decay low-contributing components, ultimately achieving zero additional inference cost while maintaining performance.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[23] FraQAT: Quantization Aware Training with Fractional bits PDF
[31] DilateQuant: Accurate and Efficient Quantization-Aware Training for Diffusion Models via Weight Dilation PDF
[36] DilateQuant: Accurate and Efficient Diffusion Quantization via Weight Dilation PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
QVGen: A novel QAT framework for video diffusion models
The authors present QVGen, the first quantization-aware training framework specifically designed for video diffusion models. It enables effective 3-bit and 4-bit quantization while achieving full-precision comparable quality, addressing the challenge that existing QAT methods fail to handle video generation tasks under extremely low-bit settings.
[5] Quantization as a Foundation for Deployable High Performance Diffusion Models within the Landscape of Large Scale Generative AI PDF
[7] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation PDF
[17] Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers PDF
[19] QVD: Post-training Quantization for Video Diffusion Models PDF
[54] Bidm: Pushing the limit of quantization for diffusion models PDF
[55] PTQ4DiT: Post-training Quantization for Diffusion Transformers PDF
[56] Q-dm: An efficient low-bit quantized diffusion model PDF
[57] MPQ-DM: Mixed Precision Quantization for Extremely Low Bit Diffusion Models PDF
[58] MPQ-DMv2: Flexible Residual Mixed Precision Quantization for Low-Bit Diffusion Models with Temporal Distillation PDF
Auxiliary modules (Φ) to reduce gradient norm and improve convergence
The authors introduce learnable auxiliary modules that mitigate quantization errors during training. Through theoretical analysis demonstrating that reducing gradient norm is essential for QAT convergence, these modules stabilize the training process and significantly enhance convergence for extremely low-bit quantization.
[50] Training Quantized Neural Networks With a Full-Precision Auxiliary Module PDF
[49] Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training PDF
[51] Punching Above Precision: Small Quantized Model Distillation with Learnable Regularizer PDF
[52] Stable Quantization-Aware Training with Adaptive Gradient Clipping PDF
[53] AQ-DETR: Low-Bit Quantized Detection Transformer with Auxiliary Queries PDF
Rank-decay strategy to eliminate inference overhead
The authors develop a rank-decay strategy that progressively removes auxiliary modules during training to eliminate inference overhead. This strategy repeatedly applies singular value decomposition and rank-based regularization to identify and decay low-contributing components, ultimately achieving zero additional inference cost while maintaining performance.