Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model
Overview
Overall Novelty Assessment
The paper proposes ECAD, a genetic algorithm that learns per-model caching schedules for diffusion transformers, forming a Pareto frontier over quality-latency trade-offs. According to the taxonomy, ECAD resides in the 'Learned and Evolutionary Scheduling' leaf under 'Cache Scheduling and Optimization'. Notably, this leaf contains only the original paper itself—no sibling papers were identified in the taxonomy. This suggests that evolutionary or genetic algorithm-based approaches to caching schedule optimization represent a relatively sparse research direction within the broader field of diffusion acceleration.
The taxonomy reveals that ECAD's immediate neighbors include 'Adaptive and Dynamic Scheduling' (runtime-adaptive methods) and 'Error-Aware and Constraint-Based Optimization' (methods minimizing error accumulation). The broader 'Cache Scheduling and Optimization' branch sits alongside 'Feature Caching Strategies and Mechanisms', which explores what to cache (temporal, spatial, token-level), and 'Application-Specific Acceleration', which tailors caching to domains like video or editing. ECAD's evolutionary search distinguishes it from learned neural policies and fixed heuristics, positioning it as a middle ground between adaptability and computational overhead. The taxonomy's scope notes clarify that ECAD excludes runtime-adaptive approaches, focusing instead on offline schedule discovery.
Among the three contributions analyzed, 'ECAD: Evolutionary Caching to Accelerate Diffusion models' examined three candidates with zero refutable prior work, suggesting novelty in applying genetic algorithms to this problem. 'Pareto frontier formulation for diffusion caching' examined ten candidates, also with zero refutations, indicating that framing caching as multi-objective optimization may be underexplored. However, 'Component-level caching with binary tensor representation' examined ten candidates and found three refutable instances, suggesting that the core mechanism of selective feature reuse has substantial prior work. These statistics reflect a limited search scope of twenty-three total candidates, not an exhaustive survey.
Based on the limited search scope, ECAD appears to introduce a relatively novel optimization strategy (evolutionary search) to a well-studied problem (feature caching). The absence of sibling papers in its taxonomy leaf and the zero refutations for the evolutionary approach suggest originality in methodology, though the underlying caching mechanisms show overlap with existing work. The analysis covers top-K semantic matches and does not claim completeness; broader literature may reveal additional evolutionary or genetic algorithm applications to diffusion acceleration.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors introduce ECAD, a genetic algorithm-based framework that discovers efficient caching schedules for diffusion models by formulating caching as a multi-objective Pareto optimization problem over image quality and inference speed. The method requires only a small set of calibration prompts and no modifications to network parameters.
The authors reframe diffusion caching as a multi-objective optimization problem that discovers Pareto frontiers, enabling fine-grained control over quality-latency trade-offs rather than offering only a few discrete schedules with fixed trade-offs as in prior heuristic-based approaches.
The authors propose a component-level caching strategy for DiT blocks represented as a binary tensor, where individual functional components (self-attention, cross-attention, feedforward) can be selectively cached or recomputed at each timestep and block, enabling more granular optimization than block-level caching.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
Contribution Analysis
Detailed comparisons for each claimed contribution
ECAD: Evolutionary Caching to Accelerate Diffusion models
The authors introduce ECAD, a genetic algorithm-based framework that discovers efficient caching schedules for diffusion models by formulating caching as a multi-objective Pareto optimization problem over image quality and inference speed. The method requires only a small set of calibration prompts and no modifications to network parameters.
[60] Two-timescale model caching and resource allocation for edge-enabled AI-generated content services PDF
[61] OUSAC: Optimized Guidance Scheduling with Adaptive Caching for DiT Acceleration PDF
[62] Space Computing: Architectures, Challenges, and Future Directions PDF
Pareto frontier formulation for diffusion caching
The authors reframe diffusion caching as a multi-objective optimization problem that discovers Pareto frontiers, enabling fine-grained control over quality-latency trade-offs rather than offering only a few discrete schedules with fixed trade-offs as in prior heuristic-based approaches.
[63] Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference PDF
[64] SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation PDF
[65] PROUD: PaRetO-gUided diffusion model for multi-objective generation PDF
[66] Parrot: Pareto-optimal multi-reward reinforcement learning framework for text-to-image generation PDF
[67] CD4LM: Consistency Distillation and aDaptive Decoding for Diffusion Language Models PDF
[68] SPREAD: Sampling-based Pareto front Refinement via Efficient Adaptive Diffusion PDF
[69] Analysis of Attention in Video Diffusion Transformers PDF
[70] Planned Diffusion PDF
[71] Remasking Discrete Diffusion Models with Inference-Time Scaling PDF
[72] Nonlinear optimization-driven deep learning framework for medical image reconstruction via partial differential equations PDF
Component-level caching with binary tensor representation
The authors propose a component-level caching strategy for DiT blocks represented as a binary tensor, where individual functional components (self-attention, cross-attention, feedforward) can be selectively cached or recomputed at each timestep and block, enabling more granular optimization than block-level caching.