Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model

ICLR 2026 Conference SubmissionAnonymous Authors
diffusion cachingimage generationefficient deep learningdiffusion transformersinference acceleration
Abstract:

Diffusion-based image generation models excel at producing high-quality synthetic content, but suffer from slow and computationally expensive inference. Prior work has attempted to mitigate this by caching and reusing features within diffusion transformers across inference steps. These methods, however, often rely on rigid heuristics that result in limited acceleration or poor generalization across architectures. We propose Evolutionary Caching to Accelerate Diffusion models (ECAD), a genetic algorithm that learns efficient, per-model, caching schedules forming a Pareto frontier, using only a small set of calibration prompts. ECAD requires no modifications to network parameters or reference images. It offers significant inference speedups, enables fine-grained control over the quality-latency trade-off, and adapts seamlessly to different diffusion models. Notably, ECAD's learned schedules can generalize effectively to resolutions and model variants not seen during calibration. We evaluate ECAD on PixArt-alpha, PixArt-Sigma, and FLUX-1.dev using multiple metrics (FID, CLIP, Image Reward) across diverse benchmarks (COCO, MJHQ-30k, PartiPrompts), demonstrating consistent improvements over previous approaches. On PixArt-alpha, ECAD identifies a schedule that outperforms the previous state-of-the-art method by 4.47 COCO FID while increasing inference speedup from 2.35x to 2.58x. Our results establish ECAD as a scalable and generalizable approach for accelerating diffusion inference.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes ECAD, a genetic algorithm that learns per-model caching schedules for diffusion transformers, forming a Pareto frontier over quality-latency trade-offs. According to the taxonomy, ECAD resides in the 'Learned and Evolutionary Scheduling' leaf under 'Cache Scheduling and Optimization'. Notably, this leaf contains only the original paper itself—no sibling papers were identified in the taxonomy. This suggests that evolutionary or genetic algorithm-based approaches to caching schedule optimization represent a relatively sparse research direction within the broader field of diffusion acceleration.

The taxonomy reveals that ECAD's immediate neighbors include 'Adaptive and Dynamic Scheduling' (runtime-adaptive methods) and 'Error-Aware and Constraint-Based Optimization' (methods minimizing error accumulation). The broader 'Cache Scheduling and Optimization' branch sits alongside 'Feature Caching Strategies and Mechanisms', which explores what to cache (temporal, spatial, token-level), and 'Application-Specific Acceleration', which tailors caching to domains like video or editing. ECAD's evolutionary search distinguishes it from learned neural policies and fixed heuristics, positioning it as a middle ground between adaptability and computational overhead. The taxonomy's scope notes clarify that ECAD excludes runtime-adaptive approaches, focusing instead on offline schedule discovery.

Among the three contributions analyzed, 'ECAD: Evolutionary Caching to Accelerate Diffusion models' examined three candidates with zero refutable prior work, suggesting novelty in applying genetic algorithms to this problem. 'Pareto frontier formulation for diffusion caching' examined ten candidates, also with zero refutations, indicating that framing caching as multi-objective optimization may be underexplored. However, 'Component-level caching with binary tensor representation' examined ten candidates and found three refutable instances, suggesting that the core mechanism of selective feature reuse has substantial prior work. These statistics reflect a limited search scope of twenty-three total candidates, not an exhaustive survey.

Based on the limited search scope, ECAD appears to introduce a relatively novel optimization strategy (evolutionary search) to a well-studied problem (feature caching). The absence of sibling papers in its taxonomy leaf and the zero refutations for the evolutionary approach suggest originality in methodology, though the underlying caching mechanisms show overlap with existing work. The analysis covers top-K semantic matches and does not claim completeness; broader literature may reveal additional evolutionary or genetic algorithm applications to diffusion acceleration.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
23
Contribution Candidate Papers Compared
3
Refutable Paper

Research Landscape Overview

Core task: Accelerating diffusion model inference through feature caching optimization. The field has organized itself around several complementary directions. Feature Caching Strategies and Mechanisms explores fundamental approaches to reusing intermediate activations, ranging from simple uniform caching (DeepCache[3]) to more sophisticated token-level (Tokenwise Feature Caching[5]) and dual-stream methods (Dual Feature Caching[2]). Cache Scheduling and Optimization focuses on deciding when and what to cache, including fixed heuristics, learned policies (Learning-to-Cache[26]), and evolutionary strategies. Speculative and Parallel Acceleration Techniques leverage prediction and concurrency (SpecDiff[6]), while Application-Specific Acceleration tailors caching to domains like video generation (Adaptive Caching Video[21]) or editing tasks. Integrated Acceleration Frameworks combine caching with quantization or pruning (CacheQuant[40]), and Theoretical Foundations and Analysis provide error bounds and convergence guarantees. Specialized Architectural Adaptations modify network designs to better exploit caching opportunities. Within Cache Scheduling and Optimization, a central tension emerges between simplicity and adaptability. Fixed schedules offer predictability but may waste computation or degrade quality across diverse prompts, whereas learned approaches (Learning-to-Cache[26]) and profiling-based methods (Profiling-Based Reuse[25]) adapt to input characteristics at the cost of added overhead. Evolutionary Caching[0] sits within the Learned and Evolutionary Scheduling cluster, employing evolutionary algorithms to discover cache schedules that balance speed and fidelity. This contrasts with neighboring learned methods like Learning-to-Cache[26], which typically train neural policies, and with simpler heuristics such as DeepCache[3], which apply uniform intervals. The evolutionary approach offers a middle ground: it searches over scheduling policies without requiring differentiable training, potentially discovering non-intuitive patterns that fixed rules miss while avoiding the sample complexity of end-to-end learning. Open questions remain about generalization across model architectures and the computational cost of the search itself.

Claimed Contributions

ECAD: Evolutionary Caching to Accelerate Diffusion models

The authors introduce ECAD, a genetic algorithm-based framework that discovers efficient caching schedules for diffusion models by formulating caching as a multi-objective Pareto optimization problem over image quality and inference speed. The method requires only a small set of calibration prompts and no modifications to network parameters.

3 retrieved papers
Pareto frontier formulation for diffusion caching

The authors reframe diffusion caching as a multi-objective optimization problem that discovers Pareto frontiers, enabling fine-grained control over quality-latency trade-offs rather than offering only a few discrete schedules with fixed trade-offs as in prior heuristic-based approaches.

10 retrieved papers
Component-level caching with binary tensor representation

The authors propose a component-level caching strategy for DiT blocks represented as a binary tensor, where individual functional components (self-attention, cross-attention, feedforward) can be selectively cached or recomputed at each timestep and block, enabling more granular optimization than block-level caching.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

ECAD: Evolutionary Caching to Accelerate Diffusion models

The authors introduce ECAD, a genetic algorithm-based framework that discovers efficient caching schedules for diffusion models by formulating caching as a multi-objective Pareto optimization problem over image quality and inference speed. The method requires only a small set of calibration prompts and no modifications to network parameters.

Contribution

Pareto frontier formulation for diffusion caching

The authors reframe diffusion caching as a multi-objective optimization problem that discovers Pareto frontiers, enabling fine-grained control over quality-latency trade-offs rather than offering only a few discrete schedules with fixed trade-offs as in prior heuristic-based approaches.

Contribution

Component-level caching with binary tensor representation

The authors propose a component-level caching strategy for DiT blocks represented as a binary tensor, where individual functional components (self-attention, cross-attention, feedforward) can be selectively cached or recomputed at each timestep and block, enabling more granular optimization than block-level caching.

Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model | Novelty Validation