LLaVA-FA: Learning Fourier Approximation for Compressing Large Multimodal Models

ICLR 2026 Conference SubmissionAnonymous Authors
Large Multimodal ModelsModel CompressionFourier DomainMatrix Approximation
Abstract:

Large multimodal models (LMMs) have achieved impressive performance on various vision-language tasks, but their substantial computational and memory costs hinder their practical deployment. Existing compression methods often decouple low-rank decomposition and quantization, leading to compounded reconstruction errors, especially in multimodal architectures with cross-modal redundancy. To address this issue, we propose LLaVA-FA, a novel efficient LMM that performs joint low-rank plus quantization approximation in the frequency domain. By leveraging the de-correlation and conjugate symmetry properties of Fourier transform, LLaVA-FA achieves more compact and accurate weight representations. Furthermore, we introduce PolarQuant, a polar-coordinate quantization method tailored for complex matrices, and an optional diagonal calibration (ODC) scheme that eliminates the need for large-scale calibration data. Extensive experimental results demonstrate that our proposed LLaVA-FA outperforms existing efficient multimodal models across multiple benchmarks while maintaining minimal activated parameters and low computational costs, validating its effectiveness as a powerful solution for compressing LMMs.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes LLaVA-FA, which performs joint low-rank and quantization approximation in the frequency domain for compressing large multimodal models. According to the taxonomy tree, this work occupies the 'Frequency-Domain Joint Approximation' leaf under 'Joint Low-Rank and Quantization Methods'. Notably, this leaf contains only the original paper itself with no sibling papers, indicating this is a relatively unexplored research direction. The broader parent category 'Joint Low-Rank and Quantization Methods' contains only five papers total across three leaves, suggesting the joint approximation approach represents a sparse area within the compression landscape.

The taxonomy reveals that most related work pursues alternative strategies. The sibling leaf 'Spatial-Domain Joint Approximation' contains two papers performing joint optimization directly in weight space rather than frequency domain. Another sibling leaf 'Multi-Technique Integration' adds pruning or sparsity to the joint framework. Neighboring branches include 'Quantization-Aware Low-Rank Adaptation' with nine papers integrating quantization into LoRA-based fine-tuning, and 'Post-Training Quantization' with four papers applying compression without retraining. The taxonomy's scope and exclude notes clarify that frequency-domain methods are distinguished from spatial approaches by their use of Fourier transforms, while methods combining only low-rank and quantization without additional techniques belong in the joint approximation categories.

Among fifteen candidates examined across three contributions, none were found to clearly refute the proposed work. The core contribution 'LLaVA-FA' examined two candidates with zero refutable matches. The 'PolarQuant' quantization scheme examined ten candidates, again with no refutations, suggesting this polar-coordinate approach for complex matrices may be novel within the limited search scope. The 'Optional Diagonal Calibration' scheme examined three candidates without finding overlapping prior work. These statistics indicate that within the top-fifteen semantic matches analyzed, the paper's specific combination of frequency-domain joint approximation, polar quantization, and calibration-free optimization appears distinct from existing approaches.

Taxonomy

Core-task Taxonomy Papers
35
3
Claimed Contributions
15
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Compressing large multimodal models via joint low-rank and quantization approximation. The field has evolved into a rich landscape of techniques that combine low-rank decomposition and quantization to reduce model size and computational cost. The taxonomy reveals several major branches: some focus on joint methods that simultaneously optimize both low-rank structure and quantization (e.g., LLaVA-FA[0], CASP[2]), while others pursue quantization-aware low-rank adaptation where quantization is integrated into the training or fine-tuning process (e.g., QA-LoRA[11], LoQT[12]). Post-training quantization methods apply compression after model training, and low-rank correction techniques use low-rank updates to recover accuracy lost during quantization (e.g., Lord[23]). Additional branches address specialized architectures, deployment optimizations for edge devices (e.g., Edge-side NPU inference optimization[3]), domain-specific applications, and comparative surveys (e.g., Comprehensive survey of model[5]) that synthesize these diverse approaches. A particularly active line of work explores how to balance compression ratio against task performance, with many studies investigating adaptive rank selection and mixed-precision strategies (e.g., AdaQLoRA[28], MLoRQ[29]). Trade-offs between training efficiency and final model quality remain central: some methods prioritize minimal retraining overhead, while others accept longer fine-tuning to achieve tighter compression. LLaVA-FA[0] sits within the joint approximation branch, emphasizing frequency-domain techniques to unify low-rank and quantization steps in a single optimization framework. This contrasts with approaches like QA-LoRA[11] or LoQT[12], which layer quantization awareness into existing low-rank adaptation schemes, and with post-training methods that avoid retraining altogether. The interplay between these strategies highlights ongoing questions about when joint optimization yields the best compression-accuracy frontier and how domain-specific constraints (e.g., multimodal versus language-only models) shape method design.

Claimed Contributions

LLaVA-FA: Joint low-rank plus quantization approximation in frequency domain

The authors introduce LLaVA-FA, a framework that decomposes weight matrices of large multimodal models into low-rank plus quantized components using Fourier transform. This approach leverages de-correlation and conjugate symmetry properties to achieve more compact and accurate weight representations than spatial-domain methods.

2 retrieved papers
PolarQuant: Polar-coordinate quantization for complex matrices

The authors design PolarQuant, a quantization codec that separately discretizes amplitude and phase in polar coordinates for complex matrices. This method preserves complex structure and stabilizes low-bit reconstruction in the frequency domain.

10 retrieved papers
Optional Diagonal Calibration (ODC) scheme

The authors propose ODC, a calibration scheme that approximates the full Hessian matrix using row and column means. This enables robust compression without requiring large-scale calibration datasets, making the method more practical for deployment.

3 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

LLaVA-FA: Joint low-rank plus quantization approximation in frequency domain

The authors introduce LLaVA-FA, a framework that decomposes weight matrices of large multimodal models into low-rank plus quantized components using Fourier transform. This approach leverages de-correlation and conjugate symmetry properties to achieve more compact and accurate weight representations than spatial-domain methods.

Contribution

PolarQuant: Polar-coordinate quantization for complex matrices

The authors design PolarQuant, a quantization codec that separately discretizes amplitude and phase in polar coordinates for complex matrices. This method preserves complex structure and stabilizes low-bit reconstruction in the frequency domain.

Contribution

Optional Diagonal Calibration (ODC) scheme

The authors propose ODC, a calibration scheme that approximates the full Hessian matrix using row and column means. This enables robust compression without requiring large-scale calibration datasets, making the method more practical for deployment.