Hilbert-Guided Sparse Local Attention

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

local attentionwindow attentionneighborhood attentionsliding window attentionHilbert curveattention acceleration

The quadratic compute and memory costs of global self-attention severely limit its use in high-resolution images. Local attention reduces complexity by restricting attention to neighborhoods. Block-sparse kernels can further improve the efficiency of local attention, but conventional local attention patterns often fail to deliver significant speedups because tokens within a window are not contiguous in the 1D sequence. This work proposes a novel method for constructing windows and neighborhoods based on the Hilbert curve. Image tokens are first reordered along a Hilbert curve, and windows and neighborhoods are then formed on the reordered 1D sequence. From a block-sparse perspective, this strategy significantly increases block sparsity and can be combined with existing block-sparse kernels to improve the efficiency of 2D local attention. Experiments show that the proposed Hilbert Window Attention and Hilbert Slide Attention can accelerate window attention and slide attention by about $4\times$ and $18\times$ , respectively. To assess practicality, the strategy is instantiated as the Hilbert Window Transformer and the Hilbert Neighborhood Transformer, both of which achieve end-to-end speedups with minimal accuracy loss. Overall, combining Hilbert-guided local attention with block-sparse kernels offers a general and practical approach to enhancing the efficiency of 2D local attention for images.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes using Hilbert curve-based token reordering to construct local attention windows and neighborhoods, aiming to improve block sparsity for efficient computation on high-resolution images. It sits within the 'Spatial Locality-Based Block-Sparse Attention' leaf, which contains only three papers total. This is a relatively sparse research direction within the broader taxonomy of 29 papers across multiple application domains. The sibling papers include Generalized Neighborhood Attention and another Hilbert Block Sparse method, suggesting that curve-based reordering for block sparsity is an emerging but not yet crowded area.

The taxonomy tree reveals that the paper's leaf is part of the 'Block-Sparse Attention Mechanisms and Architectures' branch, which also includes dynamic/adaptive sparsity methods and sorting-based approaches. Neighboring leaves explore learned sparsity patterns (e.g., Adaptive Sparse Attention) and differentiable sorting for quasi-global attention, representing alternative strategies to fixed spatial reordering. The broader taxonomy shows substantial activity in application domains—video generation, super-resolution, medical imaging—indicating that foundational block-sparse mechanisms like this work may serve as building blocks for diverse downstream tasks.

Among the 30 candidates examined via limited semantic search, none were found to clearly refute any of the three main contributions: Hilbert-guided window construction (10 candidates examined, 0 refutable), the specific attention variants (10 examined, 0 refutable), and the transformer architectures (10 examined, 0 refutable). This suggests that within the examined scope, the combination of Hilbert curve reordering with block-sparse kernels for local attention appears relatively novel. However, the search was not exhaustive, and the presence of a sibling paper with a similar name ('Hilbert Block Sparse') warrants careful comparison to delineate incremental versus substantive differences.

Based on the limited search scope of 30 semantically related candidates, the work appears to occupy a niche intersection of spatial reordering and block-sparse efficiency. The taxonomy context indicates this is an emerging direction rather than a saturated one, though the sibling paper suggests some prior exploration of Hilbert-based methods. A more comprehensive literature review would be needed to fully assess whether the specific instantiations (Window/Slide/Neighborhood Attention variants) represent meaningful architectural innovations or incremental refinements of existing curve-based sparsity ideas.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Efficient local attention for high-resolution images using block-sparse computation. The field addresses the computational bottleneck of standard attention mechanisms when applied to high-resolution visual data by exploiting spatial locality and sparsity. The taxonomy reveals several main branches: Block-Sparse Attention Mechanisms and Architectures focuses on foundational designs that partition attention into localized blocks or neighborhoods, often leveraging spatial proximity to reduce complexity (e.g., Generalized Neighborhood Attention[5], Hilbert Block Sparse[19]). Video Generation and Processing with Block-Sparse Attention extends these ideas to temporal domains, where sparsity patterns must handle both spatial and temporal dependencies. Image Super-Resolution with Sparse and Local Attention applies block-sparse techniques to reconstruction tasks, balancing detail recovery with efficiency. Medical Image Analysis with Sparse Attention tailors these methods to domain-specific constraints such as volumetric data and limited annotations. Finally, Specialized Applications of Block-Sparse and Local Attention encompasses diverse use cases from pansharpening to radar imaging, demonstrating the broad applicability of locality-based sparsity. A particularly active line of work explores how to define effective spatial neighborhoods and sparsity patterns that preserve long-range dependencies while maintaining computational efficiency. Trade-offs emerge between fixed geometric patterns (e.g., local windows or Hilbert curves) and adaptive schemes that learn sparsity dynamically (Adaptive Sparse Attention[4], Dynamic Block Sparse[13]). Within this landscape, Hilbert Sparse Attention[0] sits in the Spatial Locality-Based Block-Sparse Attention cluster, emphasizing structured spatial orderings to define block patterns. This approach contrasts with Generalized Neighborhood Attention[5], which offers flexible neighborhood definitions, and Hilbert Block Sparse[19], which similarly exploits space-filling curves but may differ in implementation details. The central question remains how to balance the expressiveness of learned sparsity against the predictability and hardware-friendliness of fixed geometric patterns, especially as resolution scales continue to grow.

Claimed Contributions

Hilbert-guided construction of local windows and neighborhoods

10 retrieved papers

The authors introduce a method that reorders image tokens along a Hilbert curve before forming windows and neighborhoods on the reordered 1D sequence. This strategy significantly increases block sparsity in local attention patterns, enabling more efficient computation when combined with block-sparse kernels.

10 retrieved papers

Hilbert Window Attention, Hilbert Slide Attention, and Hilbert Neighborhood Attention

10 retrieved papers

The authors design three new attention mechanisms (HWA, HSA, and HNA) that leverage Hilbert reordering to create more efficient sparse attention patterns. These mechanisms achieve significant speedups (approximately 4× for window attention and 18× for slide attention) over conventional local attention approaches.

10 retrieved papers

Hilbert Window Transformer and Hilbert Neighborhood Transformer architectures

10 retrieved papers

The authors develop two complete transformer architectures (HWT and HNT) that integrate Hilbert-guided local attention mechanisms. These models demonstrate practical feasibility by achieving end-to-end speedups while maintaining comparable accuracy to baseline models on ImageNet and CIFAR datasets.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[5] Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light PDF

Hassani Ali, Zhou Feng-zhe, Ali Hassani, Kane, Aditya, Fengzhe Zhou, Huang Jian-nan, Aditya Kane, Chen, Chieh-Yun, Jiannan Huang, Shi Min, Chieh-Yun Chen, Walton, Steven, Min Shi, Steven Walton, Thakkar Vijay, Markus Hoehnerbach, Vijay Thakkar, Zhang, Qinsheng, Michael Isaev, Xu Bing, Qinsheng Zhang, Wu Haicheng, Bing Xu, Hwu, Wen-mei, Haicheng Wu, Liu Ming-yu, Wen-Mei Hwu, Shi, Humphrey, Ming-Yu Liu, Humphrey Shi (2025)

[19] Hilbert-Guided Block-Sparse Local Attention PDF

Yunge Li, Lanyu Xu (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Hilbert-guided construction of local windows and neighborhoods

[30] MoirÃ© patterns of space-filling curves PDF

Cannot Refute

[31] Neural space-filling curves PDF

Cannot Refute

[32] LEST: Large-Scale LiDAR Semantic Segmentation With Deployment-Friendly Transformer Architecture PDF

Cannot Refute

[33] Lest: Large-scale lidar semantic segmentation with transformer PDF

Cannot Refute

[34] Grid Point Serialized Transformer for LiDAR Point Cloud Semantic Segmentation in Various Densities and Heights Scenes PDF

Cannot Refute

[35] Boosting Vision State Space Model with Fractal Scanning PDF

Cannot Refute

[36] Voxel mamba: Group-free state space models for point cloud based 3d object detection PDF

Cannot Refute

[37] Space-filling curves for modeling spatial context in transformer-based whole slide image classification PDF

Cannot Refute

[38] GFPE-ViT: vision transformer with geometric-fractal-based position encoding PDF

Cannot Refute

[39] Training-Free Efficient Video Generation via Dynamic Token Carving PDF

Cannot Refute

Contribution

Hilbert Window Attention, Hilbert Slide Attention, and Hilbert Neighborhood Attention

[3] VMoBA: Mixture-of-Block Attention for Video Diffusion Models PDF

Cannot Refute

[4] Training-free and adaptive sparse attention for efficient long video generation PDF

Cannot Refute

[5] Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light PDF

Cannot Refute

[40] Xattention: Block sparse attention with antidiagonal scoring PDF

Cannot Refute

[41] FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving PDF

Cannot Refute

[42] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention PDF

Cannot Refute

[43] PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models PDF

Cannot Refute

[44] DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance PDF

Cannot Refute

[45] Scatterbrain: Unifying sparse and low-rank attention PDF

Cannot Refute

[46] Compact attention: Exploiting structured spatio-temporal sparsity for fast video generation PDF

Cannot Refute

Contribution

Hilbert Window Transformer and Hilbert Neighborhood Transformer architectures

[47] Fast vision transformers with hilo attention PDF

Cannot Refute

[48] EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention PDF

Cannot Refute

[49] BiFormer: Vision Transformer with Bi-Level Routing Attention PDF

Cannot Refute

[50] FLatten Transformer: Vision Transformer using Focused Linear Attention PDF

Cannot Refute

[51] Efficient Vision Transformers with Partial Attention PDF

Cannot Refute

[52] Vision Transformer with Deformable Attention PDF

Cannot Refute

[53] Neighborhood attention transformer PDF

Cannot Refute

[54] Lite vision transformer with enhanced self-attention PDF

Cannot Refute

[55] Mobilevitv3: Mobile-friendly vision transformer with simple and effective fusion of local, global and input features PDF

Cannot Refute

[56] Vision Transformers with Hierarchical Attention PDF

Cannot Refute

Hilbert-Guided Sparse Local Attention

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[5] Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light PDF

[19] Hilbert-Guided Block-Sparse Local Attention PDF

Contribution Analysis

Hilbert-guided construction of local windows and neighborhoods

[30] MoirÃ© patterns of space-filling curves PDF

[31] Neural space-filling curves PDF

[32] LEST: Large-Scale LiDAR Semantic Segmentation With Deployment-Friendly Transformer Architecture PDF

[33] Lest: Large-scale lidar semantic segmentation with transformer PDF

[34] Grid Point Serialized Transformer for LiDAR Point Cloud Semantic Segmentation in Various Densities and Heights Scenes PDF

[35] Boosting Vision State Space Model with Fractal Scanning PDF

[36] Voxel mamba: Group-free state space models for point cloud based 3d object detection PDF

[37] Space-filling curves for modeling spatial context in transformer-based whole slide image classification PDF

[38] GFPE-ViT: vision transformer with geometric-fractal-based position encoding PDF

[39] Training-Free Efficient Video Generation via Dynamic Token Carving PDF

Hilbert Window Attention, Hilbert Slide Attention, and Hilbert Neighborhood Attention

[3] VMoBA: Mixture-of-Block Attention for Video Diffusion Models PDF

[4] Training-free and adaptive sparse attention for efficient long video generation PDF

[5] Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light PDF

[40] Xattention: Block sparse attention with antidiagonal scoring PDF

[41] FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving PDF

[42] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention PDF

[43] PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models PDF

[44] DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance PDF

[45] Scatterbrain: Unifying sparse and low-rank attention PDF

[46] Compact attention: Exploiting structured spatio-temporal sparsity for fast video generation PDF

Hilbert Window Transformer and Hilbert Neighborhood Transformer architectures

[47] Fast vision transformers with hilo attention PDF

[48] EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention PDF

[49] BiFormer: Vision Transformer with Bi-Level Routing Attention PDF

[50] FLatten Transformer: Vision Transformer using Focused Linear Attention PDF

[51] Efficient Vision Transformers with Partial Attention PDF

[52] Vision Transformer with Deformable Attention PDF

[53] Neighborhood attention transformer PDF

[54] Lite vision transformer with enhanced self-attention PDF

[55] Mobilevitv3: Mobile-friendly vision transformer with simple and effective fusion of local, global and input features PDF

[56] Vision Transformers with Hierarchical Attention PDF

Table of Contents