Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.0 Download Report PDF

Efficient Autoregressive Image GenerationParallel Decoding

We present Locality-aware Parallel Decoding (LPD) to accelerate autoregressive image generation. Traditional autoregressive image generation relies on next-patch prediction, a memory-bound process that leads to high latency. Existing works have tried to parallelize next-patch prediction by shifting to multi-patch prediction to accelerate the process, but only achieved limited parallelization. To achieve high parallelization while maintaining generation quality, we introduce two key techniques: (1) Flexible Parallelized Autoregressive Modeling, a novel architecture that enables arbitrary generation ordering and degrees of parallelization. It uses learnable position query tokens to guide generation at target positions while ensuring mutual visibility among concurrently generated tokens for consistent parallel decoding. (2) Locality-aware Generation Ordering, a novel schedule that forms groups to minimize intra-group dependencies and maximize contextual support, enhancing generation quality. With these designs, we reduce the generation steps from 256 to 20 (256×256 res.) and 1024 to 48 (512×512 res.) without compromising quality on the ImageNet class-conditional generation, and achieving at least 3.4× lower latency than previous parallelized autoregressive models.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: accelerating autoregressive image generation through parallel decoding. The field addresses the inherent sequential bottleneck of autoregressive models by exploring diverse strategies to predict multiple tokens simultaneously. The taxonomy reveals a rich landscape organized around twelve major branches. Spatial Locality-Based Parallel Decoding exploits the natural correlation among neighboring image patches to enable concurrent predictions, as seen in works like Zipar[1] and Parallelized Autoregressive Visual[2]. Hierarchical and Multi-Scale Autoregressive Modeling decomposes generation into coarse-to-fine stages, while Block-Based and Semi-Autoregressive Decoding groups tokens into chunks for batch processing. Random-Order and Flexible-Order approaches relax strict raster-scan dependencies, and Speculative and Iterative Parallel Decoding methods draft multiple candidates in parallel before verification. Masked and Non-Autoregressive branches draw inspiration from diffusion and masked language models, whereas Retrieval-Augmented and Context-Aware Generation incorporates external knowledge. Additional branches cover variational latent models, bidirectional architectures, system-level optimizations, domain-specific extensions, and theoretical foundations, reflecting the breadth of innovation in this space. Several active lines of work highlight contrasting trade-offs between generation quality, speed, and architectural complexity. Spatial locality methods such as Neighboring Autoregressive Modeling[4] and Next Block Prediction[5] achieve strong speedups by predicting contiguous regions, yet must carefully balance parallelism with maintaining coherence across boundaries. Speculative techniques and iterative refinement offer flexible acceleration but introduce verification overhead. Locality Parallel Decoding[0] sits within the Spatial Locality-Based branch under Flexible Parallelized Autoregressive Modeling, emphasizing adaptive parallel prediction guided by local dependencies. Compared to neighbors like Parallelized Autoregressive Visual[2], which also leverages spatial structure, Locality Parallel Decoding[0] appears to focus on dynamic locality-aware scheduling rather than fixed block partitioning. This positioning reflects a broader trend toward flexible, content-adaptive parallelization strategies that aim to preserve autoregressive quality while unlocking substantial inference speedups.

Claimed Contributions

Flexible Parallelized Autoregressive Modeling

Can Refute

10 retrieved papers

The authors introduce a novel architecture that decouples context representation from token generation by using learnable position query tokens. This design enables arbitrary generation order and degrees of parallelization while maintaining mutual visibility among concurrently generated tokens through specialized attention mechanisms, and inherits KV caching to avoid redundant computation.

10 retrieved papers

Can Refute

Locality-aware Generation Ordering

7 retrieved papers

The authors propose a generation order schedule guided by two principles: selecting target positions spatially close to existing context for strong conditioning, and ensuring concurrently generated tokens are spatially distant to reduce mutual dependency. This schedule leverages spatial locality patterns observed in autoregressive image generation attention.

7 retrieved papers

Locality-aware Parallel Decoding Framework

Can Refute

9 retrieved papers

The authors present a complete framework combining flexible parallelized autoregressive modeling with locality-aware generation ordering to significantly reduce generation steps (from 256 to 20 for 256×256 resolution and 1024 to 48 for 512×512 resolution) while maintaining generation quality and achieving at least 3.4× lower latency than previous parallelized autoregressive models.

9 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[2] Parallelized autoregressive visual generation PDF

Yuqing Wang, Shuhuai Ren, Zhijie Lin, Yujin Han, Haoyuan Guo, Zhenheng Yang, Difan Zou, Jiashi Feng, Xihui Liu (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Flexible Parallelized Autoregressive Modeling

[9] RandAR: Decoder-only Autoregressive Visual Generation in Random Orders PDF

Can Refute

[10] Autoregressive Image Generation with Randomized Parallel Decoding PDF

Can Refute

[56] Qwen2.5-Omni Technical Report PDF

Cannot Refute

[57] Ar-diffusion: Auto-regressive diffusion model for text generation PDF

Cannot Refute

[58] VideoGPT: Video Generation using VQ-VAE and Transformers PDF

Cannot Refute

[59] Insertion transformer: Flexible sequence generation via insertion operations PDF

Cannot Refute

[60] Larp: Tokenizing videos with a learned autoregressive generative prior PDF

Cannot Refute

[61] STAR: Scale-wise Text-conditioned AutoRegressive image generation PDF

Cannot Refute

[62] Symbol-rooted cascade propagation in contextual memory routing for large language models PDF

Cannot Refute

[63] Leapformer: Enabling linear transformers for autoregressive and simultaneous tasks via learned proportions PDF

Cannot Refute

Contribution

Locality-aware Generation Ordering

[49] Nuwa-infinity: Autoregressive over autoregressive generation for infinite visual synthesis PDF

Cannot Refute

[50] Toward Improving the Generation Quality of Autoregressive Slot VAEs PDF

Cannot Refute

[51] Next patch prediction for autoregressive visual generation PDF

Cannot Refute

[52] Mimt: Masked image modeling transformer for video compression PDF

Cannot Refute

[53] HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation PDF

Cannot Refute

[54] Autoregressive Image Generation with Linear Complexity: A Spatial-Aware Decay Perspective PDF

Cannot Refute

[55] RichControl: Structure- and Appearance-Rich Training-Free Spatial Control for Text-to-Image Generation PDF

Cannot Refute

Contribution

Locality-aware Parallel Decoding Framework

[2] Parallelized autoregressive visual generation PDF

Can Refute

[24] Parallel multiscale autoregressive density estimation PDF

Can Refute

[11] AR-RAG: Autoregressive Retrieval Augmentation for Image Generation PDF

Cannot Refute

[25] Collaborative decoding makes visual auto-regressive modeling efficient PDF

Cannot Refute

[34] Speculative Jacobi-Denoising Decoding for Accelerating Autoregressive Text-to-image Generation PDF

Cannot Refute

[36] Grouped Speculative Decoding for Autoregressive Image Generation PDF

Cannot Refute

[64] Macro-from-micro planning for high-quality and parallelized autoregressive long video generation PDF

Cannot Refute

[65] Maskgit: Masked generative image transformer PDF

Cannot Refute

[66] SCALAR: Scale-wise Controllable Visual Autoregressive Learning PDF

Cannot Refute

Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation

Overview

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[2] Parallelized autoregressive visual generation PDF

Contribution Analysis

Flexible Parallelized Autoregressive Modeling

[9] RandAR: Decoder-only Autoregressive Visual Generation in Random Orders PDF

[10] Autoregressive Image Generation with Randomized Parallel Decoding PDF

[56] Qwen2.5-Omni Technical Report PDF

[57] Ar-diffusion: Auto-regressive diffusion model for text generation PDF

[58] VideoGPT: Video Generation using VQ-VAE and Transformers PDF

[59] Insertion transformer: Flexible sequence generation via insertion operations PDF

[60] Larp: Tokenizing videos with a learned autoregressive generative prior PDF

[61] STAR: Scale-wise Text-conditioned AutoRegressive image generation PDF

[62] Symbol-rooted cascade propagation in contextual memory routing for large language models PDF

[63] Leapformer: Enabling linear transformers for autoregressive and simultaneous tasks via learned proportions PDF

Locality-aware Generation Ordering

[49] Nuwa-infinity: Autoregressive over autoregressive generation for infinite visual synthesis PDF

[50] Toward Improving the Generation Quality of Autoregressive Slot VAEs PDF

[51] Next patch prediction for autoregressive visual generation PDF

[52] Mimt: Masked image modeling transformer for video compression PDF

[53] HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation PDF

[54] Autoregressive Image Generation with Linear Complexity: A Spatial-Aware Decay Perspective PDF

[55] RichControl: Structure- and Appearance-Rich Training-Free Spatial Control for Text-to-Image Generation PDF

Locality-aware Parallel Decoding Framework

[2] Parallelized autoregressive visual generation PDF

[24] Parallel multiscale autoregressive density estimation PDF

[11] AR-RAG: Autoregressive Retrieval Augmentation for Image Generation PDF

[25] Collaborative decoding makes visual auto-regressive modeling efficient PDF

[34] Speculative Jacobi-Denoising Decoding for Accelerating Autoregressive Text-to-image Generation PDF

[36] Grouped Speculative Decoding for Autoregressive Image Generation PDF

[64] Macro-from-micro planning for high-quality and parallelized autoregressive long video generation PDF

[65] Maskgit: Masked generative image transformer PDF

[66] SCALAR: Scale-wise Controllable Visual Autoregressive Learning PDF

Table of Contents