QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

quantizationparameter efficient fine-tuningsparse adapterwalsh-hadamard transform

The demand for efficient deployment of large language models (LLMs) has driven interest in quantization, which reduces inference cost, and parameter-efficient fine-tuning (PEFT), which lowers training overhead. This motivated the development of quantization-aware PEFT to produce accurate yet efficient quantized models. In this setting, reducing quantization error prior to fine-tuning is crucial for achieving high model accuracy. However, existing methods that rely on low-rank adaptation suffer from limited representational capacity. Recent Fourier-related transform (FT)-based adapters offer greater representational power than low-rank adapters, but their direct integration into quantized models often results in ineffective error reduction and increased computational overhead.
To overcome these limitations, we propose QWHA, a method that integrates FT-based adapters into quantized models by employing the Walsh-Hadamard Transform (WHT) as the transform kernel, together with a novel adapter initialization scheme incorporating adaptive parameter selection and value refinement. We demonstrate that QWHA effectively mitigates quantization errors while facilitating fine-tuning, and that its design substantially reduces computational cost. Experimental results show that QWHA consistently outperforms baselines in low-bit quantization accuracy and achieves significant training speedups over existing FT-based adapters.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes QWHA, a method that integrates Walsh-Hadamard Transform-based adapters into quantized large language models with a novel initialization scheme (AdaAlloc) for adaptive parameter selection and value refinement. Within the taxonomy, it resides in the 'Transform-Based Quantization-Aware Adaptation' leaf under 'Quantization-Aware Training Frameworks'. This leaf contains only two papers total, indicating a relatively sparse research direction compared to more crowded areas like 'Low-Rank Adaptation with Quantization Awareness', which spans multiple sub-categories with numerous papers.

The taxonomy structure reveals that QWHA's immediate neighbors include low-rank adaptation methods (LoRA variants with group-wise quantization, initialization-aware schemes, and rank-adaptive approaches) and outlier management techniques. The 'Transform-Based' leaf sits alongside these more populated branches, suggesting that Fourier-related and orthogonal transform approaches represent an emerging alternative to dominant low-rank paradigms. The taxonomy's scope note explicitly distinguishes transform-based methods from rotation-based outlier suppression and pure low-rank techniques, positioning QWHA at the intersection of representational capacity enhancement and quantization error mitigation.

Among 30 candidates examined, the contribution-level analysis shows mixed novelty signals. The overall QWHA method (10 candidates examined, 0 refutable) and the AdaAlloc initialization scheme (10 candidates examined, 0 refutable) appear to have no clear prior work overlap within the limited search scope. However, the Walsh-Hadamard Transform-based adapter design itself (10 candidates examined, 1 refutable) shows at least one candidate providing overlapping prior work. This suggests the transform kernel choice may have precedent, while the integration strategy and initialization approach appear more distinctive within the examined literature.

Based on the limited top-30 semantic search scope, QWHA appears to occupy a relatively under-explored niche combining transform-based adaptation with quantization-aware initialization. The sparse population of its taxonomy leaf and the lack of refutable candidates for two of three contributions suggest potential novelty, though the single refutable candidate for the WHT adapter design indicates some prior exploration of transform kernels. A more exhaustive search would be needed to definitively assess originality across the broader quantization-aware PEFT landscape.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: quantization-aware parameter-efficient fine-tuning for large language models. This field addresses the challenge of adapting massive pre-trained models to downstream tasks while simultaneously reducing memory and computational costs through quantization. The taxonomy reveals a rich landscape organized around several complementary themes. Quantization-Aware Training Frameworks explore methods that integrate low-bit representations directly into the fine-tuning loop, often combining techniques like low-rank adaptation with learned quantization parameters (e.g., QA-LoRA[2], LLM-QAT[1]). Outlier and Activation Management tackles the problem of extreme values that degrade quantized model quality, while Specialized Quantization Techniques investigate novel bit-width schemes and mixed-precision strategies. Task-Agnostic and Pre-Training Quantization focuses on compressing models before task-specific adaptation, and Efficient Deployment branches emphasize inference-time optimizations. Domain-Specific and Multimodal Applications extend these ideas to specialized settings, and Reinforcement Learning paradigms incorporate quantization into policy optimization. Comprehensive Evaluation and Benchmarking provides systematic comparisons, while Survey and Review Literature synthesizes progress across the field. Foundational PEFT and Quantization Methods anchor the taxonomy with seminal works like QLoRA[48] and LoftQ[5]. A particularly active line of work centers on transform-based and initialization-aware strategies that carefully balance quantization error with adaptation capacity. QWHA[0] exemplifies this direction by proposing a Hadamard-transform approach to mitigate quantization-induced distortions during parameter-efficient tuning, situating itself within the Quantization-Aware Training Frameworks branch alongside methods like HALO[50] that similarly manipulate weight representations before quantization. This contrasts with approaches such as LoftQ[5], which emphasizes joint initialization of quantized weights and low-rank adapters, or OWQ[3], which prioritizes outlier-aware schemes. The interplay between these strategies highlights a central trade-off: whether to invest effort in pre-quantization transformations, adaptive rank selection, or outlier suppression. QWHA[0] leans toward transformation-based mitigation, offering a complementary perspective to works that adjust rank dynamically or manage activations explicitly, and reflects ongoing exploration of how structural interventions can preserve fine-tuning expressiveness under aggressive compression.

Claimed Contributions

QWHA method integrating Walsh-Hadamard Transform-based adapter with quantization-aware initialization

10 retrieved papers

The authors introduce QWHA, which combines a Walsh-Hadamard Transform-based adapter (WHA) with a quantization-aware initialization strategy for parameter-efficient fine-tuning of quantized large language models. This method addresses limitations of existing low-rank and Fourier-transform-based adapters in the quantization-aware setting.

10 retrieved papers

Walsh-Hadamard Transform-based adapter (WHA) design

Can Refute

10 retrieved papers

The authors design a novel adapter using the Walsh-Hadamard Transform as the transform kernel, which consists only of ±1 entries enabling efficient computation through additions and subtractions. Unlike conventional Fourier-transform-based adapters, WHA applies a single transform rather than double transforms, reducing computational overhead while maintaining superior representational capacity.

10 retrieved papers

Can Refute

AdaAlloc parameter selection and value refinement initialization scheme

10 retrieved papers

The authors develop a tractable initialization solution consisting of AdaAlloc, which adaptively allocates parameters across output channels proportional to their quantization errors while ensuring full-rank capacity, and a value refinement step that re-projects selected parameters to minimize layer output error. This initialization effectively reduces quantization errors before fine-tuning.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[50] HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs PDF

Ashkboos, Saleh, Nikdan, Mahdi, Saleh Ashkboos, Tabesh, Soroush, Mahdi Nikdan, Castro, Roberto L., Soroush Tabesh, Hoefler, Torsten, Roberto L. Castro, Alistarh, Dan, Torsten Hoefler, Dan Alistarh (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

QWHA method integrating Walsh-Hadamard Transform-based adapter with quantization-aware initialization

[2] Qa-lora: Quantization-aware low-rank adaptation of large language models PDF

Cannot Refute

[5] Loftq: Lora-fine-tuning-aware quantization for large language models PDF

Cannot Refute

[48] QLoRA: Efficient Finetuning of Quantized LLMs PDF

Cannot Refute

[51] Sparse low-rank adaptation of pre-trained language models PDF

Cannot Refute

[52] Cora: Optimizing low-rank adaptation with common subspace of large language models PDF

Cannot Refute

[53] LoRS: Efficient Low-Rank Adaptation for Sparse Large Language Model PDF

Cannot Refute

[54] LoRA: Low-Rank Adaptation of Large Language Models PDF

Cannot Refute

[55] Lost: Low-rank and sparse pre-training for large language models PDF

Cannot Refute

[56] DenseLoRA: Dense Low-Rank Adaptation of Large Language Models PDF

Cannot Refute

[57] Dynamic Low-Rank Sparse Adaptation for Large Language Models PDF

Cannot Refute

Contribution

Walsh-Hadamard Transform-based adapter (WHA) design

[69] Fast randomized low-rank adaptation of pre-trained language models with pac regularization PDF

Can Refute

[68] A 22nm 9.51 TOPS/W Neural Engine with 2MB MRAM Leveraging Sparse-Orthogonal Walsh-Hadamard Transform Computations and Dynamic Power Gating PDF

Cannot Refute

[70] Real-Time Low-Cost Drift Compensation for Chemical Sensors Using a Deep Neural Network With Hadamard Transform and Additive Layers PDF

Cannot Refute

[71] Block WalshâHadamard Transform-based Binary Layers in Deep Neural Networks PDF

Cannot Refute

[72] Voltage Based Electronic Control Unit (ECU) Identification with Convolutional Neural Networks and WalshâHadamard Transform PDF

Cannot Refute

[73] Efficient Transformations in Deep Learning Convolutional Neural Networks PDF

Cannot Refute

[74] Fast Walsh-Hadamard Transform and Smooth-Thresholding Based Binary Layers in Deep Neural Networks PDF

Cannot Refute

[75] Walsh-hadamard variational inference for bayesian deep learning PDF

Cannot Refute

[76] SuperLoRA: Parameter-Efficient Unified Adaptation of Large Foundation Models PDF

Cannot Refute

[77] Unleashing the Power of Random Projection for Efficient Machine Learning PDF

Cannot Refute

Contribution

AdaAlloc parameter selection and value refinement initialization scheme

[58] Plug-and-play 1. x-bit kv cache quantization for video large language models PDF

Cannot Refute

[59] Channel-wise mixed-precision quantization for large language models PDF

Cannot Refute

[60] Bayesian bits: Unifying quantization and pruning PDF

Cannot Refute

[61] Adaptive quantization and pruning of deep neural networks via layer importance estimation PDF

Cannot Refute

[62] Joint optimization of dimension reduction and mixed-precision quantization for activation compression of neural networks PDF

Cannot Refute

[63] Towards optimal layer ordering for efficient model compression via pruning and quantization PDF

Cannot Refute

[64] On-Device Large Language Models: A Survey of Model Compression and System Optimization PDF

Cannot Refute

[65] A Survey on Network Quantization Techniques for Deep Neural Network Compression PDF

Cannot Refute

[66] Distribution Adaptive INT8 Quantization for Training CNNs PDF

Cannot Refute

[67] Distribution-aware adaptive multi-bit quantization PDF

Cannot Refute

QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[50] HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs PDF

Contribution Analysis

QWHA method integrating Walsh-Hadamard Transform-based adapter with quantization-aware initialization

[2] Qa-lora: Quantization-aware low-rank adaptation of large language models PDF

[5] Loftq: Lora-fine-tuning-aware quantization for large language models PDF

[48] QLoRA: Efficient Finetuning of Quantized LLMs PDF

[51] Sparse low-rank adaptation of pre-trained language models PDF

[52] Cora: Optimizing low-rank adaptation with common subspace of large language models PDF

[53] LoRS: Efficient Low-Rank Adaptation for Sparse Large Language Model PDF

[54] LoRA: Low-Rank Adaptation of Large Language Models PDF

[55] Lost: Low-rank and sparse pre-training for large language models PDF

[56] DenseLoRA: Dense Low-Rank Adaptation of Large Language Models PDF

[57] Dynamic Low-Rank Sparse Adaptation for Large Language Models PDF

Walsh-Hadamard Transform-based adapter (WHA) design

[69] Fast randomized low-rank adaptation of pre-trained language models with pac regularization PDF

[68] A 22nm 9.51 TOPS/W Neural Engine with 2MB MRAM Leveraging Sparse-Orthogonal Walsh-Hadamard Transform Computations and Dynamic Power Gating PDF

[70] Real-Time Low-Cost Drift Compensation for Chemical Sensors Using a Deep Neural Network With Hadamard Transform and Additive Layers PDF

[71] Block WalshâHadamard Transform-based Binary Layers in Deep Neural Networks PDF

[72] Voltage Based Electronic Control Unit (ECU) Identification with Convolutional Neural Networks and WalshâHadamard Transform PDF

[73] Efficient Transformations in Deep Learning Convolutional Neural Networks PDF

[74] Fast Walsh-Hadamard Transform and Smooth-Thresholding Based Binary Layers in Deep Neural Networks PDF

[75] Walsh-hadamard variational inference for bayesian deep learning PDF

[76] SuperLoRA: Parameter-Efficient Unified Adaptation of Large Foundation Models PDF

[77] Unleashing the Power of Random Projection for Efficient Machine Learning PDF

AdaAlloc parameter selection and value refinement initialization scheme

[58] Plug-and-play 1. x-bit kv cache quantization for video large language models PDF

[59] Channel-wise mixed-precision quantization for large language models PDF

[60] Bayesian bits: Unifying quantization and pruning PDF

[61] Adaptive quantization and pruning of deep neural networks via layer importance estimation PDF

[62] Joint optimization of dimension reduction and mixed-precision quantization for activation compression of neural networks PDF

[63] Towards optimal layer ordering for efficient model compression via pruning and quantization PDF

[64] On-Device Large Language Models: A Survey of Model Compression and System Optimization PDF

[65] A Survey on Network Quantization Techniques for Deep Neural Network Compression PDF

[66] Distribution Adaptive INT8 Quantization for Training CNNs PDF

[67] Distribution-aware adaptive multi-bit quantization PDF

Table of Contents

[71] Block WalshâHadamard Transform-based Binary Layers in Deep Neural Networks PDF

[72] Voltage Based Electronic Control Unit (ECU) Identification with Convolutional Neural Networks and WalshâHadamard Transform PDF