Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
Overview
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors introduce a novel architecture that decouples context representation from token generation by using learnable position query tokens. This design enables arbitrary generation order and degrees of parallelization while maintaining mutual visibility among concurrently generated tokens through specialized attention mechanisms, and inherits KV caching to avoid redundant computation.
The authors propose a generation order schedule guided by two principles: selecting target positions spatially close to existing context for strong conditioning, and ensuring concurrently generated tokens are spatially distant to reduce mutual dependency. This schedule leverages spatial locality patterns observed in autoregressive image generation attention.
The authors present a complete framework combining flexible parallelized autoregressive modeling with locality-aware generation ordering to significantly reduce generation steps (from 256 to 20 for 256×256 resolution and 1024 to 48 for 512×512 resolution) while maintaining generation quality and achieving at least 3.4× lower latency than previous parallelized autoregressive models.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[2] Parallelized autoregressive visual generation PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
Flexible Parallelized Autoregressive Modeling
The authors introduce a novel architecture that decouples context representation from token generation by using learnable position query tokens. This design enables arbitrary generation order and degrees of parallelization while maintaining mutual visibility among concurrently generated tokens through specialized attention mechanisms, and inherits KV caching to avoid redundant computation.
[9] RandAR: Decoder-only Autoregressive Visual Generation in Random Orders PDF
[10] Autoregressive Image Generation with Randomized Parallel Decoding PDF
[56] Qwen2.5-Omni Technical Report PDF
[57] Ar-diffusion: Auto-regressive diffusion model for text generation PDF
[58] VideoGPT: Video Generation using VQ-VAE and Transformers PDF
[59] Insertion transformer: Flexible sequence generation via insertion operations PDF
[60] Larp: Tokenizing videos with a learned autoregressive generative prior PDF
[61] STAR: Scale-wise Text-conditioned AutoRegressive image generation PDF
[62] Symbol-rooted cascade propagation in contextual memory routing for large language models PDF
[63] Leapformer: Enabling linear transformers for autoregressive and simultaneous tasks via learned proportions PDF
Locality-aware Generation Ordering
The authors propose a generation order schedule guided by two principles: selecting target positions spatially close to existing context for strong conditioning, and ensuring concurrently generated tokens are spatially distant to reduce mutual dependency. This schedule leverages spatial locality patterns observed in autoregressive image generation attention.
[49] Nuwa-infinity: Autoregressive over autoregressive generation for infinite visual synthesis PDF
[50] Toward Improving the Generation Quality of Autoregressive Slot VAEs PDF
[51] Next patch prediction for autoregressive visual generation PDF
[52] Mimt: Masked image modeling transformer for video compression PDF
[53] HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation PDF
[54] Autoregressive Image Generation with Linear Complexity: A Spatial-Aware Decay Perspective PDF
[55] RichControl: Structure- and Appearance-Rich Training-Free Spatial Control for Text-to-Image Generation PDF
Locality-aware Parallel Decoding Framework
The authors present a complete framework combining flexible parallelized autoregressive modeling with locality-aware generation ordering to significantly reduce generation steps (from 256 to 20 for 256×256 resolution and 1024 to 48 for 512×512 resolution) while maintaining generation quality and achieving at least 3.4× lower latency than previous parallelized autoregressive models.