HoRA: Cross-Head Low-Rank Adaptation with Joint Hypernetworks

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.7 Download Report PDF

Low-rank AdaptationMulti-head Self-attentionMixture of ExpertsHypernetworks

Low-Rank Adaptation (LoRA) is a parameter-efficient fine-tuning (PEFT) technique that adapts large pre-trained models by adding low-rank matrices to their weight updates. However, in the context of fine-tuning multi-head self-attention (MHA), LoRA has been employed to adapt each attention head separately, thereby overlooking potential synergies across different heads. To mitigate this issue, we propose a novel Hyper-shared Low-Rank Adaptation (HoRA) method, which utilizes joint hypernetworks to generate low-rank matrices across attention heads. By coupling their adaptation through a shared generator, HoRA encourages cross-head information sharing, and thus directly addresses the aforementioned limitation of LoRA. By comparing LoRA and HoRA through the lens of hierarchical mixture of experts, our theoretical findings reveal that the latter achieves superior sample efficiency to the former. Furthermore, through extensive experiments across diverse language and vision benchmarks, we demonstrate that HoRA outperforms LoRA and other PEFT methods while requiring only a marginal increase in the number of trainable parameters.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes HoRA, a cross-head low-rank adaptation method that uses joint hypernetworks to generate low-rank matrices across attention heads, enabling information sharing during fine-tuning. According to the taxonomy, this work occupies the 'Cross-Head Low-Rank Adaptation' leaf under 'Low-Rank Adaptation Methods', where it currently appears as the sole paper. This positioning suggests the paper addresses a relatively sparse research direction within the broader low-rank adaptation landscape, which contains multiple active leaves including standard per-head methods, orthogonality-constrained approaches, and multimodal adaptations.

The taxonomy reveals that HoRA sits adjacent to several related but distinct approaches. Its immediate neighbors include 'Standard Low-Rank Adaptation' methods that apply decomposition independently per head (e.g., Vision Transformer adaptations, Serial decomposition), 'Orthogonality-Constrained' methods that enforce structural properties, and 'Semantic-Guided' approaches that incorporate input semantics. The taxonomy's scope notes clarify that cross-head coupling distinguishes this work from standard per-head methods, while its hypernetwork-based sharing mechanism differentiates it from multi-task adapter routing strategies found in the 'Adapter-Based Methods' branch.

Among 16 candidates examined across three contributions, the analysis found limited prior work overlap. The core HoRA method (Contribution 1) examined 1 candidate with no refutations, suggesting novelty in the specific hypernetwork-based cross-head coupling mechanism. The theoretical connection to hierarchical mixture of experts (Contribution 2) examined 9 candidates with 1 refutable match, indicating some existing theoretical frameworks may overlap. The sample efficiency claim (Contribution 3) examined 6 candidates without refutations. These statistics reflect a focused semantic search scope rather than exhaustive coverage, and the sparse 'Cross-Head' leaf suggests this direction has received limited prior attention.

Given the limited search scope of 16 candidates and the paper's placement in a currently unpopulated taxonomy leaf, the work appears to explore a relatively underexplored mechanism for cross-head information sharing in low-rank adaptation. However, the analysis cannot rule out relevant prior work outside the examined candidate set, particularly in adjacent areas like multi-task adapter sharing or cross-layer smoothness exploitation that employ related coupling principles through different architectural choices.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: parameter-efficient fine-tuning of multi-head self-attention mechanisms. The field has organized itself around several complementary strategies for adapting large pre-trained models without full retraining. Low-Rank Adaptation Methods decompose weight updates into compact factorizations, enabling efficient parameter updates across attention layers; works like Orthogonal Fine-tuning[3] and Householder Transformation[5] explore structured low-rank constraints. Prompt-Based Tuning Methods inject learnable tokens or attention-level prompts (e.g., Attention Prompt Tuning[11]) to steer model behavior with minimal added parameters. Adapter-Based Methods insert small bottleneck modules between transformer blocks, as seen in AdaptFormer[9] and AdaViT[8], while Attention Mechanism Modification directly restructures attention computations—for instance, Alternating Attention[17] and Isomorphic Attention[20]. Dynamic and Adaptive Methods adjust tuning strategies on-the-fly (Dynamic Tuning[41], Adaptive Layer Selection[38]), Specialized Application Methods target domain-specific tasks like medical imaging (LiteMedSAM[31]) or action recognition (Action Recognition[28]), and Efficiency-Focused Optimization Methods prioritize inference speed and memory footprint (Economical Inference[4], Time-Memory Efficient[30]). A particularly active line of research centers on low-rank factorizations that exploit cross-head or cross-layer structure, balancing expressiveness with compactness. HoRA[0] exemplifies this direction by introducing cross-head low-rank adaptation, sharing decomposition factors across multiple attention heads to reduce redundancy. This approach contrasts with methods like Trainable Self-Attention[1], which modifies attention weights more directly, and PARA[2], which applies parameter-efficient updates in a different structural regime. Meanwhile, orthogonal constraints (Orthogonal Fine-tuning[3]) and transformation-based parameterizations (Householder Transformation[5]) offer alternative ways to maintain model stability and generalization during adaptation. The interplay between rank selection, head-wise sharing, and layer-specific tuning remains an open question, with HoRA[0] positioned among works that seek to exploit redundancy in multi-head architectures while preserving the expressive power needed for diverse downstream tasks.

Claimed Contributions

HoRA method with joint hypernetworks for cross-head information sharing

1 retrieved paper

The authors introduce HoRA, a parameter-efficient fine-tuning technique that uses shared hypernetworks to generate low-rank adaptation matrices across multiple attention heads. This design encourages cross-head information sharing and addresses the limitation of LoRA, which adapts each attention head independently without coordination.

1 retrieved paper

Theoretical connection between multi-head LoRA and hierarchical mixture of experts

Can Refute

9 retrieved papers

The authors formalize a theoretical relationship showing that applying LoRA to multi-head self-attention can be reinterpreted as a Hierarchical Mixture-of-Experts model. This perspective provides a principled foundation for understanding and improving parameter-efficient fine-tuning in multi-head attention.

9 retrieved papers

Can Refute

Sample efficiency improvement from exponential to polynomial rate

6 retrieved papers

The authors prove that HoRA's shared structure across attention heads improves the sample complexity of estimating low-rank matrices from exponential order to polynomial order. This theoretical result demonstrates that parameter sharing yields superior generalization guarantees compared to independent adaptation.

6 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

HoRA method with joint hypernetworks for cross-head information sharing

[51] Attention as a Hypernetwork PDF

Cannot Refute

Contribution

Theoretical connection between multi-head LoRA and hierarchical mixture of experts

[59] RepLoRA: Reparameterizing Low-Rank Adaptation via the Perspective of Mixture of Experts PDF

Can Refute

[52] Mixlora: Enhancing large language models fine-tuning with lora-based mixture of experts PDF

Cannot Refute

[53] HMoRA: Making LLMs more effective with hierarchical mixture of loRA experts PDF

Cannot Refute

[54] TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts PDF

Cannot Refute

[55] When moe meets llms: Parameter efficient fine-tuning for multi-task medical applications PDF

Cannot Refute

[56] Multi-objective Large Language Model Alignment with Hierarchical Experts PDF

Cannot Refute

[57] Malora: Mixture of asymmetric low-rank adaptation for enhanced multi-task learning PDF

Cannot Refute

[58] Omoe: Diversifying mixture of low-rank adaptation by orthogonal finetuning PDF

Cannot Refute

[60] Teamlora: Boosting low-rank adaptation with expert collaboration and competition PDF

Cannot Refute

Contribution

Sample efficiency improvement from exponential to polynomial rate

[62] A Shared Low-Rank Adaptation Approach to Personalized RLHF PDF

Cannot Refute

[63] Sample-Efficient Personalization: Modeling User Parameters as Low Rank Plus Sparse Components PDF

Cannot Refute

[64] Trainable Subspaces for Low Rank Tensor Completion: Model and Analysis PDF

Cannot Refute

[65] KARIPAP: Quantum-Inspired Tensor Network Compression of Large Language Models Using Infinite Projected Entangled Pair States and Tensor Renormalization â¦ PDF

Cannot Refute

[66] On Sample Complexity of Learning Shared Representations: The Asymptotic Regime PDF

Cannot Refute

[67] 8 Model-Based Deep-Learning Algorithms for Inverse Problems PDF

Cannot Refute

HoRA: Cross-Head Low-Rank Adaptation with Joint Hypernetworks

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

HoRA method with joint hypernetworks for cross-head information sharing

[51] Attention as a Hypernetwork PDF

Theoretical connection between multi-head LoRA and hierarchical mixture of experts

[59] RepLoRA: Reparameterizing Low-Rank Adaptation via the Perspective of Mixture of Experts PDF

[52] Mixlora: Enhancing large language models fine-tuning with lora-based mixture of experts PDF

[53] HMoRA: Making LLMs more effective with hierarchical mixture of loRA experts PDF

[54] TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts PDF

[55] When moe meets llms: Parameter efficient fine-tuning for multi-task medical applications PDF

[56] Multi-objective Large Language Model Alignment with Hierarchical Experts PDF

[57] Malora: Mixture of asymmetric low-rank adaptation for enhanced multi-task learning PDF

[58] Omoe: Diversifying mixture of low-rank adaptation by orthogonal finetuning PDF

[60] Teamlora: Boosting low-rank adaptation with expert collaboration and competition PDF

Sample efficiency improvement from exponential to polynomial rate

[62] A Shared Low-Rank Adaptation Approach to Personalized RLHF PDF

[63] Sample-Efficient Personalization: Modeling User Parameters as Low Rank Plus Sparse Components PDF

[64] Trainable Subspaces for Low Rank Tensor Completion: Model and Analysis PDF

[65] KARIPAP: Quantum-Inspired Tensor Network Compression of Large Language Models Using Infinite Projected Entangled Pair States and Tensor Renormalization â¦ PDF

[66] On Sample Complexity of Learning Shared Representations: The Asymptotic Regime PDF

[67] 8 Model-Based Deep-Learning Algorithms for Inverse Problems PDF

Table of Contents

[65] KARIPAP: Quantum-Inspired Tensor Network Compression of Large Language Models Using Infinite Projected Entangled Pair States and Tensor Renormalization â¦ PDF