Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

diffusion cachingimage generationefficient deep learningdiffusion transformersinference acceleration

Diffusion-based image generation models excel at producing high-quality synthetic content, but suffer from slow and computationally expensive inference. Prior work has attempted to mitigate this by caching and reusing features within diffusion transformers across inference steps. These methods, however, often rely on rigid heuristics that result in limited acceleration or poor generalization across architectures. We propose Evolutionary Caching to Accelerate Diffusion models (ECAD), a genetic algorithm that learns efficient, per-model, caching schedules forming a Pareto frontier, using only a small set of calibration prompts. ECAD requires no modifications to network parameters or reference images. It offers significant inference speedups, enables fine-grained control over the quality-latency trade-off, and adapts seamlessly to different diffusion models. Notably, ECAD's learned schedules can generalize effectively to resolutions and model variants not seen during calibration. We evaluate ECAD on PixArt-alpha, PixArt-Sigma, and FLUX-1.dev using multiple metrics (FID, CLIP, Image Reward) across diverse benchmarks (COCO, MJHQ-30k, PartiPrompts), demonstrating consistent improvements over previous approaches. On PixArt-alpha, ECAD identifies a schedule that outperforms the previous state-of-the-art method by 4.47 COCO FID while increasing inference speedup from 2.35x to 2.58x. Our results establish ECAD as a scalable and generalizable approach for accelerating diffusion inference.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes ECAD, a genetic algorithm that learns per-model caching schedules for diffusion transformers, forming a Pareto frontier over quality-latency trade-offs. According to the taxonomy, ECAD resides in the 'Learned and Evolutionary Scheduling' leaf under 'Cache Scheduling and Optimization'. Notably, this leaf contains only the original paper itself—no sibling papers were identified in the taxonomy. This suggests that evolutionary or genetic algorithm-based approaches to caching schedule optimization represent a relatively sparse research direction within the broader field of diffusion acceleration.

The taxonomy reveals that ECAD's immediate neighbors include 'Adaptive and Dynamic Scheduling' (runtime-adaptive methods) and 'Error-Aware and Constraint-Based Optimization' (methods minimizing error accumulation). The broader 'Cache Scheduling and Optimization' branch sits alongside 'Feature Caching Strategies and Mechanisms', which explores what to cache (temporal, spatial, token-level), and 'Application-Specific Acceleration', which tailors caching to domains like video or editing. ECAD's evolutionary search distinguishes it from learned neural policies and fixed heuristics, positioning it as a middle ground between adaptability and computational overhead. The taxonomy's scope notes clarify that ECAD excludes runtime-adaptive approaches, focusing instead on offline schedule discovery.

Among the three contributions analyzed, 'ECAD: Evolutionary Caching to Accelerate Diffusion models' examined three candidates with zero refutable prior work, suggesting novelty in applying genetic algorithms to this problem. 'Pareto frontier formulation for diffusion caching' examined ten candidates, also with zero refutations, indicating that framing caching as multi-objective optimization may be underexplored. However, 'Component-level caching with binary tensor representation' examined ten candidates and found three refutable instances, suggesting that the core mechanism of selective feature reuse has substantial prior work. These statistics reflect a limited search scope of twenty-three total candidates, not an exhaustive survey.

Based on the limited search scope, ECAD appears to introduce a relatively novel optimization strategy (evolutionary search) to a well-studied problem (feature caching). The absence of sibling papers in its taxonomy leaf and the zero refutations for the evolutionary approach suggest originality in methodology, though the underlying caching mechanisms show overlap with existing work. The analysis covers top-K semantic matches and does not claim completeness; broader literature may reveal additional evolutionary or genetic algorithm applications to diffusion acceleration.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Accelerating diffusion model inference through feature caching optimization. The field has organized itself around several complementary directions. Feature Caching Strategies and Mechanisms explores fundamental approaches to reusing intermediate activations, ranging from simple uniform caching (DeepCache[3]) to more sophisticated token-level (Tokenwise Feature Caching[5]) and dual-stream methods (Dual Feature Caching[2]). Cache Scheduling and Optimization focuses on deciding when and what to cache, including fixed heuristics, learned policies (Learning-to-Cache[26]), and evolutionary strategies. Speculative and Parallel Acceleration Techniques leverage prediction and concurrency (SpecDiff[6]), while Application-Specific Acceleration tailors caching to domains like video generation (Adaptive Caching Video[21]) or editing tasks. Integrated Acceleration Frameworks combine caching with quantization or pruning (CacheQuant[40]), and Theoretical Foundations and Analysis provide error bounds and convergence guarantees. Specialized Architectural Adaptations modify network designs to better exploit caching opportunities. Within Cache Scheduling and Optimization, a central tension emerges between simplicity and adaptability. Fixed schedules offer predictability but may waste computation or degrade quality across diverse prompts, whereas learned approaches (Learning-to-Cache[26]) and profiling-based methods (Profiling-Based Reuse[25]) adapt to input characteristics at the cost of added overhead. Evolutionary Caching[0] sits within the Learned and Evolutionary Scheduling cluster, employing evolutionary algorithms to discover cache schedules that balance speed and fidelity. This contrasts with neighboring learned methods like Learning-to-Cache[26], which typically train neural policies, and with simpler heuristics such as DeepCache[3], which apply uniform intervals. The evolutionary approach offers a middle ground: it searches over scheduling policies without requiring differentiable training, potentially discovering non-intuitive patterns that fixed rules miss while avoiding the sample complexity of end-to-end learning. Open questions remain about generalization across model architectures and the computational cost of the search itself.

Claimed Contributions

ECAD: Evolutionary Caching to Accelerate Diffusion models

3 retrieved papers

The authors introduce ECAD, a genetic algorithm-based framework that discovers efficient caching schedules for diffusion models by formulating caching as a multi-objective Pareto optimization problem over image quality and inference speed. The method requires only a small set of calibration prompts and no modifications to network parameters.

3 retrieved papers

Pareto frontier formulation for diffusion caching

10 retrieved papers

The authors reframe diffusion caching as a multi-objective optimization problem that discovers Pareto frontiers, enabling fine-grained control over quality-latency trade-offs rather than offering only a few discrete schedules with fixed trade-offs as in prior heuristic-based approaches.

10 retrieved papers

Component-level caching with binary tensor representation

Can Refute

10 retrieved papers

The authors propose a component-level caching strategy for DiT blocks represented as a binary tensor, where individual functional components (self-attention, cross-attention, feedforward) can be selectively cached or recomputed at each timestep and block, enabling more granular optimization than block-level caching.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

ECAD: Evolutionary Caching to Accelerate Diffusion models

[60] Two-timescale model caching and resource allocation for edge-enabled AI-generated content services PDF

Cannot Refute

[61] OUSAC: Optimized Guidance Scheduling with Adaptive Caching for DiT Acceleration PDF

Cannot Refute

[62] Space Computing: Architectures, Challenges, and Future Directions PDF

Cannot Refute

Contribution

Pareto frontier formulation for diffusion caching

[63] Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference PDF

Cannot Refute

[64] SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation PDF

Cannot Refute

[65] PROUD: PaRetO-gUided diffusion model for multi-objective generation PDF

Cannot Refute

[66] Parrot: Pareto-optimal multi-reward reinforcement learning framework for text-to-image generation PDF

Cannot Refute

[67] CD4LM: Consistency Distillation and aDaptive Decoding for Diffusion Language Models PDF

Cannot Refute

[68] SPREAD: Sampling-based Pareto front Refinement via Efficient Adaptive Diffusion PDF

Cannot Refute

[69] Analysis of Attention in Video Diffusion Transformers PDF

Cannot Refute

[70] Planned Diffusion PDF

Cannot Refute

[71] Remasking Discrete Diffusion Models with Inference-Time Scaling PDF

Cannot Refute

[72] Nonlinear optimization-driven deep learning framework for medical image reconstruction via partial differential equations PDF

Cannot Refute

Contribution

Component-level caching with binary tensor representation

[23] Fora: Fast-forward caching in diffusion transformer acceleration PDF

Can Refute

[58] Audiocache: Accelerate audio generation with training-free layer caching PDF

Can Refute

[59] Exposure Bias Reduction for Enhancing Diffusion Transformer Feature Caching PDF

Can Refute

[51] Investigating a Novel Transposon Attention Scaffold for Large Scale Transformer Reasoning Patterns PDF

Cannot Refute

[52] SCX: Stateless KV-Cache Encoding for Cloud-Scale Confidential Transformer Serving PDF

Cannot Refute

[53] Token Caching for Diffusion Transformer Acceleration PDF

Cannot Refute

[54] Anchor attention, small cache: Code generation with large language models PDF

Cannot Refute

[55] Sharing attention weights for fast transformer PDF

Cannot Refute

[56] {Cost-Efficient} large language model serving for multi-turn conversations with {CachedAttention} PDF

Cannot Refute

[57] Linear attention sequence parallelism PDF

Cannot Refute

Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

ECAD: Evolutionary Caching to Accelerate Diffusion models

[60] Two-timescale model caching and resource allocation for edge-enabled AI-generated content services PDF

[61] OUSAC: Optimized Guidance Scheduling with Adaptive Caching for DiT Acceleration PDF

[62] Space Computing: Architectures, Challenges, and Future Directions PDF

Pareto frontier formulation for diffusion caching

[63] Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference PDF

[64] SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation PDF

[65] PROUD: PaRetO-gUided diffusion model for multi-objective generation PDF

[66] Parrot: Pareto-optimal multi-reward reinforcement learning framework for text-to-image generation PDF

[67] CD4LM: Consistency Distillation and aDaptive Decoding for Diffusion Language Models PDF

[68] SPREAD: Sampling-based Pareto front Refinement via Efficient Adaptive Diffusion PDF

[69] Analysis of Attention in Video Diffusion Transformers PDF

[70] Planned Diffusion PDF

[71] Remasking Discrete Diffusion Models with Inference-Time Scaling PDF

[72] Nonlinear optimization-driven deep learning framework for medical image reconstruction via partial differential equations PDF

Component-level caching with binary tensor representation

[23] Fora: Fast-forward caching in diffusion transformer acceleration PDF

[58] Audiocache: Accelerate audio generation with training-free layer caching PDF

[59] Exposure Bias Reduction for Enhancing Diffusion Transformer Feature Caching PDF

[51] Investigating a Novel Transposon Attention Scaffold for Large Scale Transformer Reasoning Patterns PDF

[52] SCX: Stateless KV-Cache Encoding for Cloud-Scale Confidential Transformer Serving PDF

[53] Token Caching for Diffusion Transformer Acceleration PDF

[54] Anchor attention, small cache: Code generation with large language models PDF

[55] Sharing attention weights for fast transformer PDF

[56] {Cost-Efficient} large language model serving for multi-turn conversations with {CachedAttention} PDF

[57] Linear attention sequence parallelism PDF

Table of Contents