Hilbert-Guided Sparse Local Attention

ICLR 2026 Conference SubmissionAnonymous Authors
local attentionwindow attentionneighborhood attentionsliding window attentionHilbert curveattention acceleration
Abstract:

The quadratic compute and memory costs of global self-attention severely limit its use in high-resolution images. Local attention reduces complexity by restricting attention to neighborhoods. Block-sparse kernels can further improve the efficiency of local attention, but conventional local attention patterns often fail to deliver significant speedups because tokens within a window are not contiguous in the 1D sequence. This work proposes a novel method for constructing windows and neighborhoods based on the Hilbert curve. Image tokens are first reordered along a Hilbert curve, and windows and neighborhoods are then formed on the reordered 1D sequence. From a block-sparse perspective, this strategy significantly increases block sparsity and can be combined with existing block-sparse kernels to improve the efficiency of 2D local attention. Experiments show that the proposed Hilbert Window Attention and Hilbert Slide Attention can accelerate window attention and slide attention by about 4×4\times and 18×18\times, respectively. To assess practicality, the strategy is instantiated as the Hilbert Window Transformer and the Hilbert Neighborhood Transformer, both of which achieve end-to-end speedups with minimal accuracy loss. Overall, combining Hilbert-guided local attention with block-sparse kernels offers a general and practical approach to enhancing the efficiency of 2D local attention for images.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes using Hilbert curve-based token reordering to construct local attention windows and neighborhoods, aiming to improve block sparsity for efficient computation on high-resolution images. It sits within the 'Spatial Locality-Based Block-Sparse Attention' leaf, which contains only three papers total. This is a relatively sparse research direction within the broader taxonomy of 29 papers across multiple application domains. The sibling papers include Generalized Neighborhood Attention and another Hilbert Block Sparse method, suggesting that curve-based reordering for block sparsity is an emerging but not yet crowded area.

The taxonomy tree reveals that the paper's leaf is part of the 'Block-Sparse Attention Mechanisms and Architectures' branch, which also includes dynamic/adaptive sparsity methods and sorting-based approaches. Neighboring leaves explore learned sparsity patterns (e.g., Adaptive Sparse Attention) and differentiable sorting for quasi-global attention, representing alternative strategies to fixed spatial reordering. The broader taxonomy shows substantial activity in application domains—video generation, super-resolution, medical imaging—indicating that foundational block-sparse mechanisms like this work may serve as building blocks for diverse downstream tasks.

Among the 30 candidates examined via limited semantic search, none were found to clearly refute any of the three main contributions: Hilbert-guided window construction (10 candidates examined, 0 refutable), the specific attention variants (10 examined, 0 refutable), and the transformer architectures (10 examined, 0 refutable). This suggests that within the examined scope, the combination of Hilbert curve reordering with block-sparse kernels for local attention appears relatively novel. However, the search was not exhaustive, and the presence of a sibling paper with a similar name ('Hilbert Block Sparse') warrants careful comparison to delineate incremental versus substantive differences.

Based on the limited search scope of 30 semantically related candidates, the work appears to occupy a niche intersection of spatial reordering and block-sparse efficiency. The taxonomy context indicates this is an emerging direction rather than a saturated one, though the sibling paper suggests some prior exploration of Hilbert-based methods. A more comprehensive literature review would be needed to fully assess whether the specific instantiations (Window/Slide/Neighborhood Attention variants) represent meaningful architectural innovations or incremental refinements of existing curve-based sparsity ideas.

Taxonomy

Core-task Taxonomy Papers
29
3
Claimed Contributions
30
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Efficient local attention for high-resolution images using block-sparse computation. The field addresses the computational bottleneck of standard attention mechanisms when applied to high-resolution visual data by exploiting spatial locality and sparsity. The taxonomy reveals several main branches: Block-Sparse Attention Mechanisms and Architectures focuses on foundational designs that partition attention into localized blocks or neighborhoods, often leveraging spatial proximity to reduce complexity (e.g., Generalized Neighborhood Attention[5], Hilbert Block Sparse[19]). Video Generation and Processing with Block-Sparse Attention extends these ideas to temporal domains, where sparsity patterns must handle both spatial and temporal dependencies. Image Super-Resolution with Sparse and Local Attention applies block-sparse techniques to reconstruction tasks, balancing detail recovery with efficiency. Medical Image Analysis with Sparse Attention tailors these methods to domain-specific constraints such as volumetric data and limited annotations. Finally, Specialized Applications of Block-Sparse and Local Attention encompasses diverse use cases from pansharpening to radar imaging, demonstrating the broad applicability of locality-based sparsity. A particularly active line of work explores how to define effective spatial neighborhoods and sparsity patterns that preserve long-range dependencies while maintaining computational efficiency. Trade-offs emerge between fixed geometric patterns (e.g., local windows or Hilbert curves) and adaptive schemes that learn sparsity dynamically (Adaptive Sparse Attention[4], Dynamic Block Sparse[13]). Within this landscape, Hilbert Sparse Attention[0] sits in the Spatial Locality-Based Block-Sparse Attention cluster, emphasizing structured spatial orderings to define block patterns. This approach contrasts with Generalized Neighborhood Attention[5], which offers flexible neighborhood definitions, and Hilbert Block Sparse[19], which similarly exploits space-filling curves but may differ in implementation details. The central question remains how to balance the expressiveness of learned sparsity against the predictability and hardware-friendliness of fixed geometric patterns, especially as resolution scales continue to grow.

Claimed Contributions

Hilbert-guided construction of local windows and neighborhoods

The authors introduce a method that reorders image tokens along a Hilbert curve before forming windows and neighborhoods on the reordered 1D sequence. This strategy significantly increases block sparsity in local attention patterns, enabling more efficient computation when combined with block-sparse kernels.

10 retrieved papers
Hilbert Window Attention, Hilbert Slide Attention, and Hilbert Neighborhood Attention

The authors design three new attention mechanisms (HWA, HSA, and HNA) that leverage Hilbert reordering to create more efficient sparse attention patterns. These mechanisms achieve significant speedups (approximately 4× for window attention and 18× for slide attention) over conventional local attention approaches.

10 retrieved papers
Hilbert Window Transformer and Hilbert Neighborhood Transformer architectures

The authors develop two complete transformer architectures (HWT and HNT) that integrate Hilbert-guided local attention mechanisms. These models demonstrate practical feasibility by achieving end-to-end speedups while maintaining comparable accuracy to baseline models on ImageNet and CIFAR datasets.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Hilbert-guided construction of local windows and neighborhoods

The authors introduce a method that reorders image tokens along a Hilbert curve before forming windows and neighborhoods on the reordered 1D sequence. This strategy significantly increases block sparsity in local attention patterns, enabling more efficient computation when combined with block-sparse kernels.

Contribution

Hilbert Window Attention, Hilbert Slide Attention, and Hilbert Neighborhood Attention

The authors design three new attention mechanisms (HWA, HSA, and HNA) that leverage Hilbert reordering to create more efficient sparse attention patterns. These mechanisms achieve significant speedups (approximately 4× for window attention and 18× for slide attention) over conventional local attention approaches.

Contribution

Hilbert Window Transformer and Hilbert Neighborhood Transformer architectures

The authors develop two complete transformer architectures (HWT and HNT) that integrate Hilbert-guided local attention mechanisms. These models demonstrate practical feasibility by achieving end-to-end speedups while maintaining comparable accuracy to baseline models on ImageNet and CIFAR datasets.