Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers

ICLR 2026 Conference SubmissionAnonymous Authors
Efficient MLDiffusion Transformer AccelerationFeature Caching
Abstract:

Diffusion Transformers (DiTs) offer state-of-the-art fidelity in image and video synthesis, but their iterative sampling process remains a major bottleneck due to the high cost of transformer forward passes at each timestep. To mitigate this, feature caching has emerged as a training-free acceleration technique that reuses or forecasts hidden representations. However, existing methods often apply a uniform caching strategy across all feature dimensions, ignoring their heterogeneous dynamic behaviors. Therefore, we adopt a new perspective by modeling hidden feature evolution as a mixture of ODEs across dimensions, and introduce \textbf{HyCa}, a Hybrid ODE solver inspired caching framework that applies dimension-wise caching strategies. HyCa achieves near-lossless acceleration across diverse domains and models, including 5.56×\times speedup on FLUX and HunyuanVideo, 6.24×\times speedup on Qwen-Image and Qwen-Image-Edit without retraining. \emph{Our code is in supplementary material and will be released on Github.}

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes HyCa, a hybrid caching framework that models hidden feature evolution as a mixture of ODEs and applies dimension-wise caching strategies. It resides in the 'Caching with ODE Solvers and Sampling Optimization' leaf, which contains only three papers total, including this work and two siblings (AB-Cache and LazyDiT). This represents a relatively sparse research direction within the broader taxonomy of 50 papers across 23 leaf nodes, suggesting the integration of ODE-inspired solvers with feature caching remains an emerging area rather than a saturated one.

The taxonomy reveals that most caching research clusters around core mechanisms (uniform temporal, token-level selective, hierarchical block-level) and adaptive strategies (runtime-adaptive, frequency-aware, magnitude-based). HyCa's parent branch, 'Hybrid and Multi-Paradigm Acceleration,' also includes leaves for caching with parallelization and caching with pruning, indicating the field is exploring synergies between caching and complementary acceleration techniques. The scope note for HyCa's leaf explicitly excludes 'pure caching without solver integration,' positioning this work at the intersection of numerical methods and feature reuse—a boundary less explored than standalone caching or standalone solver optimization.

Among 30 candidates examined, the contribution-level analysis shows mixed novelty signals. 'Heterogeneous Feature Dynamics' (10 candidates, 0 refutable) and 'State-of-the-Art Acceleration Performance' (10 candidates, 0 refutable) appear to have no clear prior work overlap within the limited search scope. However, 'HyCa: Hybrid Feature Caching Framework' (10 candidates, 1 refutable) encounters at least one candidate that provides overlapping prior work, suggesting the core framework design may share conceptual or technical elements with existing methods. The scale of this search—30 papers total—means these findings reflect top semantic matches rather than exhaustive coverage.

Given the sparse population of the ODE-solver-caching leaf and the absence of refutation for two of three contributions, the work appears to occupy a relatively novel niche within the examined scope. The single refutable candidate for the framework contribution indicates some prior overlap exists, but the limited search scale and the emerging nature of this hybrid paradigm suggest the paper may still offer substantive advances. A broader literature review would be needed to confirm whether the dimension-wise ODE mixture modeling and the specific solver integration represent genuine departures from existing hybrid acceleration methods.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: accelerating diffusion transformer inference through feature caching. The field has organized itself around several complementary strategies for reducing computational overhead in diffusion models. Core Feature Caching Mechanisms establish foundational techniques such as token-level reuse (Token Caching[4], KV Caching Diffusion[2]) and dual-stream approaches (Dual Feature Caching[5]), while Adaptive Caching Strategies introduce runtime flexibility through methods like Runtime-Adaptive Caching[16] and cluster-driven selection (Cluster-Driven Caching[22]). Predictive and Forecasting-Based Caching leverages Taylor expansions and confidence gating (TaylorSeers[26], Confidence-Gated Taylor[28]) to anticipate future features, whereas Learning-Based Caching Optimization trains policies or networks to decide what and when to cache (Learning-to-Cache[8]). Architectural and Structural Enhancements modify model designs directly (Long-Skip-Connections[14], Decoupled Diffusion Transformer[41]), and Redundancy Analysis and Profiling systematically identify reusable computations (Unveiling Redundancy[17], Profiling-Based Reuse[36]). Domain-Specific Caching Applications tailor strategies to video generation (Adaptive Caching Video[23]) or text-to-speech (Text-to-Speech Caching[18]), while Hybrid and Multi-Paradigm Acceleration combines caching with ODE solvers or sampling optimizations, and Universal and Cross-Architecture Caching aims for broad applicability across model families (OmniCache[35]). Recent work has explored trade-offs between caching granularity, error accumulation, and computational savings. Fine-grained token-wise methods (Token-wise Feature Caching[9], Rethinking Token-wise Caching[40]) offer precise control but may introduce overhead, whereas block-level or layer-skipping approaches (BlockDance[27], Skip Branches[38]) achieve coarser speedups with simpler logic. Hybrid Feature Caching[0] sits within the Hybrid and Multi-Paradigm Acceleration branch, combining caching with ODE solver refinements to balance quality and speed—a direction also pursued by LazyDiT[44] and AB-Cache[46], which similarly integrate sampling optimizations. Compared to purely adaptive schemes like Runtime-Adaptive Caching[16] or purely predictive methods like TaylorSeers[26], Hybrid Feature Caching[0] emphasizes synergy between multiple acceleration paradigms, aiming to mitigate the exposure bias and error drift that can arise when caching decisions are made in isolation from the underlying numerical solver.

Claimed Contributions

Heterogeneous Feature Dynamics in Diffusion Transformers

The authors demonstrate that hidden feature dimensions in Diffusion Transformers evolve according to distinct temporal patterns rather than a single unified process. Through clustering analysis, they reveal that these dynamics are consistent across prompts, timesteps, and resolutions, motivating the need for dimension-specific solvers.

10 retrieved papers
HyCa: Hybrid Feature Caching Framework

HyCa is a training-free acceleration framework that models hidden feature evolution as a mixture of ODEs. It clusters feature dimensions by their temporal behaviors and assigns the optimal ODE solver to each cluster through a one-time offline optimization, enabling efficient and adaptive feature prediction during inference.

10 retrieved papers
Can Refute
State-of-the-Art Acceleration Performance Across Diverse Tasks

The authors demonstrate that HyCa achieves near-lossless acceleration across multiple domains and models, including 5.56× speedup on FLUX and HunyuanVideo, and 6.24× speedup on Qwen-Image and Qwen-Image-Edit, without requiring retraining. The method is also compatible with distillation techniques, reaching up to 24.4× speedup.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Heterogeneous Feature Dynamics in Diffusion Transformers

The authors demonstrate that hidden feature dimensions in Diffusion Transformers evolve according to distinct temporal patterns rather than a single unified process. Through clustering analysis, they reveal that these dynamics are consistent across prompts, timesteps, and resolutions, motivating the need for dimension-specific solvers.

Contribution

HyCa: Hybrid Feature Caching Framework

HyCa is a training-free acceleration framework that models hidden feature evolution as a mixture of ODEs. It clusters feature dimensions by their temporal behaviors and assigns the optimal ODE solver to each cluster through a one-time offline optimization, enabling efficient and adaptive feature prediction during inference.

Contribution

State-of-the-Art Acceleration Performance Across Diverse Tasks

The authors demonstrate that HyCa achieves near-lossless acceleration across multiple domains and models, including 5.56× speedup on FLUX and HunyuanVideo, and 6.24× speedup on Qwen-Image and Qwen-Image-Edit, without requiring retraining. The method is also compatible with distillation techniques, reaching up to 24.4× speedup.