QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models

ICLR 2026 Conference SubmissionAnonymous Authors
quantizationparameter efficient fine-tuningsparse adapterwalsh-hadamard transform
Abstract:

The demand for efficient deployment of large language models (LLMs) has driven interest in quantization, which reduces inference cost, and parameter-efficient fine-tuning (PEFT), which lowers training overhead. This motivated the development of quantization-aware PEFT to produce accurate yet efficient quantized models. In this setting, reducing quantization error prior to fine-tuning is crucial for achieving high model accuracy. However, existing methods that rely on low-rank adaptation suffer from limited representational capacity. Recent Fourier-related transform (FT)-based adapters offer greater representational power than low-rank adapters, but their direct integration into quantized models often results in ineffective error reduction and increased computational overhead.
To overcome these limitations, we propose QWHA, a method that integrates FT-based adapters into quantized models by employing the Walsh-Hadamard Transform (WHT) as the transform kernel, together with a novel adapter initialization scheme incorporating adaptive parameter selection and value refinement. We demonstrate that QWHA effectively mitigates quantization errors while facilitating fine-tuning, and that its design substantially reduces computational cost. Experimental results show that QWHA consistently outperforms baselines in low-bit quantization accuracy and achieves significant training speedups over existing FT-based adapters.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes QWHA, a method that integrates Walsh-Hadamard Transform-based adapters into quantized large language models with a novel initialization scheme (AdaAlloc) for adaptive parameter selection and value refinement. Within the taxonomy, it resides in the 'Transform-Based Quantization-Aware Adaptation' leaf under 'Quantization-Aware Training Frameworks'. This leaf contains only two papers total, indicating a relatively sparse research direction compared to more crowded areas like 'Low-Rank Adaptation with Quantization Awareness', which spans multiple sub-categories with numerous papers.

The taxonomy structure reveals that QWHA's immediate neighbors include low-rank adaptation methods (LoRA variants with group-wise quantization, initialization-aware schemes, and rank-adaptive approaches) and outlier management techniques. The 'Transform-Based' leaf sits alongside these more populated branches, suggesting that Fourier-related and orthogonal transform approaches represent an emerging alternative to dominant low-rank paradigms. The taxonomy's scope note explicitly distinguishes transform-based methods from rotation-based outlier suppression and pure low-rank techniques, positioning QWHA at the intersection of representational capacity enhancement and quantization error mitigation.

Among 30 candidates examined, the contribution-level analysis shows mixed novelty signals. The overall QWHA method (10 candidates examined, 0 refutable) and the AdaAlloc initialization scheme (10 candidates examined, 0 refutable) appear to have no clear prior work overlap within the limited search scope. However, the Walsh-Hadamard Transform-based adapter design itself (10 candidates examined, 1 refutable) shows at least one candidate providing overlapping prior work. This suggests the transform kernel choice may have precedent, while the integration strategy and initialization approach appear more distinctive within the examined literature.

Based on the limited top-30 semantic search scope, QWHA appears to occupy a relatively under-explored niche combining transform-based adaptation with quantization-aware initialization. The sparse population of its taxonomy leaf and the lack of refutable candidates for two of three contributions suggest potential novelty, though the single refutable candidate for the WHT adapter design indicates some prior exploration of transform kernels. A more exhaustive search would be needed to definitively assess originality across the broader quantization-aware PEFT landscape.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: quantization-aware parameter-efficient fine-tuning for large language models. This field addresses the challenge of adapting massive pre-trained models to downstream tasks while simultaneously reducing memory and computational costs through quantization. The taxonomy reveals a rich landscape organized around several complementary themes. Quantization-Aware Training Frameworks explore methods that integrate low-bit representations directly into the fine-tuning loop, often combining techniques like low-rank adaptation with learned quantization parameters (e.g., QA-LoRA[2], LLM-QAT[1]). Outlier and Activation Management tackles the problem of extreme values that degrade quantized model quality, while Specialized Quantization Techniques investigate novel bit-width schemes and mixed-precision strategies. Task-Agnostic and Pre-Training Quantization focuses on compressing models before task-specific adaptation, and Efficient Deployment branches emphasize inference-time optimizations. Domain-Specific and Multimodal Applications extend these ideas to specialized settings, and Reinforcement Learning paradigms incorporate quantization into policy optimization. Comprehensive Evaluation and Benchmarking provides systematic comparisons, while Survey and Review Literature synthesizes progress across the field. Foundational PEFT and Quantization Methods anchor the taxonomy with seminal works like QLoRA[48] and LoftQ[5]. A particularly active line of work centers on transform-based and initialization-aware strategies that carefully balance quantization error with adaptation capacity. QWHA[0] exemplifies this direction by proposing a Hadamard-transform approach to mitigate quantization-induced distortions during parameter-efficient tuning, situating itself within the Quantization-Aware Training Frameworks branch alongside methods like HALO[50] that similarly manipulate weight representations before quantization. This contrasts with approaches such as LoftQ[5], which emphasizes joint initialization of quantized weights and low-rank adapters, or OWQ[3], which prioritizes outlier-aware schemes. The interplay between these strategies highlights a central trade-off: whether to invest effort in pre-quantization transformations, adaptive rank selection, or outlier suppression. QWHA[0] leans toward transformation-based mitigation, offering a complementary perspective to works that adjust rank dynamically or manage activations explicitly, and reflects ongoing exploration of how structural interventions can preserve fine-tuning expressiveness under aggressive compression.

Claimed Contributions

QWHA method integrating Walsh-Hadamard Transform-based adapter with quantization-aware initialization

The authors introduce QWHA, which combines a Walsh-Hadamard Transform-based adapter (WHA) with a quantization-aware initialization strategy for parameter-efficient fine-tuning of quantized large language models. This method addresses limitations of existing low-rank and Fourier-transform-based adapters in the quantization-aware setting.

10 retrieved papers
Walsh-Hadamard Transform-based adapter (WHA) design

The authors design a novel adapter using the Walsh-Hadamard Transform as the transform kernel, which consists only of ±1 entries enabling efficient computation through additions and subtractions. Unlike conventional Fourier-transform-based adapters, WHA applies a single transform rather than double transforms, reducing computational overhead while maintaining superior representational capacity.

10 retrieved papers
Can Refute
AdaAlloc parameter selection and value refinement initialization scheme

The authors develop a tractable initialization solution consisting of AdaAlloc, which adaptively allocates parameters across output channels proportional to their quantization errors while ensuring full-rank capacity, and a value refinement step that re-projects selected parameters to minimize layer output error. This initialization effectively reduces quantization errors before fine-tuning.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

QWHA method integrating Walsh-Hadamard Transform-based adapter with quantization-aware initialization

The authors introduce QWHA, which combines a Walsh-Hadamard Transform-based adapter (WHA) with a quantization-aware initialization strategy for parameter-efficient fine-tuning of quantized large language models. This method addresses limitations of existing low-rank and Fourier-transform-based adapters in the quantization-aware setting.

Contribution

Walsh-Hadamard Transform-based adapter (WHA) design

The authors design a novel adapter using the Walsh-Hadamard Transform as the transform kernel, which consists only of ±1 entries enabling efficient computation through additions and subtractions. Unlike conventional Fourier-transform-based adapters, WHA applies a single transform rather than double transforms, reducing computational overhead while maintaining superior representational capacity.

Contribution

AdaAlloc parameter selection and value refinement initialization scheme

The authors develop a tractable initialization solution consisting of AdaAlloc, which adaptively allocates parameters across output channels proportional to their quantization errors while ensuring full-rank capacity, and a value refinement step that re-projects selected parameters to minimize layer output error. This initialization effectively reduces quantization errors before fine-tuning.