Relational Feature Caching for Accelerating Diffusion Transformers

ICLR 2026 Conference SubmissionAnonymous Authors
Diffusion transformerFeature Caching
Abstract:

Feature caching approaches accelerate diffusion transformers (DiTs) by storing the output features of computationally expensive modules at certain timesteps, and exploiting them for subsequent steps to reduce redundant computations. Recent forecasting-based caching approaches employ temporal extrapolation techniques to approximate the output features with cached ones. Although effective, relying exclusively on temporal extrapolation still suffers from significant prediction errors, leading to performance degradation. Through a detailed analysis, we find that 1) these errors stem from the irregular magnitude of changes in the output features, and 2) an input feature of a module is strongly correlated with the corresponding output. Based on this, we propose relational feature caching (RFC), a novel framework that leverages the input-output relationship to enhance the accuracy of the feature prediction. Specifically, we introduce relational feature estimation (RFE) to estimate the magnitude of changes in the output features from the inputs, enabling more accurate feature predictions. We also present relational cache scheduling (RCS), which estimates the prediction errors using the input features and performs full computations only when the errors are expected to be substantial. Extensive experiments across various DiT models demonstrate that RFC consistently outperforms prior approaches significantly. We will release our code publicly upon acceptance.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes relational feature caching (RFC), a framework that leverages input-output relationships to improve feature prediction accuracy in diffusion transformer acceleration. Within the taxonomy, it occupies a unique leaf under Core Feature Caching Mechanisms called 'Relational and Input-Output Modeling,' which contains only this single paper. This positioning suggests the work introduces a relatively novel direction in a field where most prior efforts focus on temporal extrapolation, block-level reuse, or error correction strategies rather than explicitly modeling relational dependencies between inputs and outputs.

The taxonomy reveals that neighboring research directions are densely populated with alternative caching strategies. Sibling categories include Temporal Feature Reuse (3 papers on direct reuse), Predictive Feature Caching (subdivided into Taylor expansion, ODE-based, Adams-Bashforth, and speculative sampling methods), and other Core Feature Caching Mechanisms like dual-stream architectures. The paper's emphasis on input-output relationships diverges from these purely temporal or frequency-based approaches, instead proposing that prediction errors can be reduced by estimating output changes from input features—a conceptual shift from extrapolating historical features alone.

Among the 24 candidate papers examined, none were found to refute the three core contributions: relational feature estimation (RFE), relational cache scheduling (RCS), and the overall RFC framework. RFE was assessed against 10 candidates with no refutations, RCS against 4 candidates with none, and the RFC framework against 10 candidates with none. This limited search scope suggests that within the examined top-K semantic matches and citation expansions, the specific combination of input-driven magnitude estimation and error-aware scheduling appears distinct from existing temporal extrapolation or gradient-based correction methods.

The analysis reflects a focused literature search rather than an exhaustive survey, examining 24 papers from a 50-paper taxonomy. While the absence of refutations among examined candidates indicates potential novelty, the search scope leaves open the possibility that related work exists outside the top-K semantic neighborhood. The isolated taxonomy position and lack of sibling papers in the same leaf further suggest that modeling input-output relationships for caching is an emerging direction, though broader validation would require examining additional candidates beyond the current sample.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
24
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Accelerating diffusion transformers through feature caching. The field has organized itself around several complementary strategies for reducing the computational burden of iterative diffusion sampling. At the highest level, Core Feature Caching Mechanisms explore foundational approaches such as relational modeling of input-output dependencies (e.g., Relational Feature Caching[0]) and dual-stream architectures (e.g., Dual Feature Caching[1]). Granularity and Scope of Caching investigates whether to reuse features at the token level (Tokenwise Feature Caching[2]), layer level, or across entire blocks, while Optimization and Error Correction Strategies focus on minimizing approximation errors through techniques like gradient-based refinement (Error-Optimized Cache[4]) or frequency-domain adjustments (Freqca[5]). Domain-Specific Acceleration tailors caching to particular modalities such as text-to-speech or video generation, and Architectural and System-Level Optimizations address hardware-aware dataflow and distributed inference. Cross-Domain and Generalized Caching seeks unified frameworks that apply across multiple tasks and model families. A particularly active line of work centers on deciding when and where to reuse cached features, balancing speed gains against quality degradation. Some methods employ learned predictors or clustering (Learning-to-Cache[8], Cluster-Driven Caching[9]) to adaptively skip computations, while others rely on heuristic schedules or spectral analysis (SpeCa[3]). Relational Feature Caching[0] sits within the Core Feature Caching Mechanisms branch, emphasizing the modeling of relationships between inputs and outputs to guide selective reuse. Compared to simpler uniform-interval schemes like DeepCache[10] or token-level strategies such as Tokenwise Feature Caching[2], Relational Feature Caching[0] aims to capture higher-order dependencies that inform which features remain stable across diffusion steps. This relational perspective contrasts with purely error-driven approaches (Error-Optimized Cache[4]) and complements frequency-based methods (Freqca[5]) by offering a principled way to predict feature evolution, positioning it as a foundational mechanism that other optimization and correction strategies can build upon.

Claimed Contributions

Relational feature estimation (RFE)

RFE is a forecasting method that estimates the magnitude of changes in output features by leveraging the relationship between input and output feature variations. This approach addresses the irregular dynamics of feature changes across timesteps, improving prediction accuracy over temporal extrapolation techniques alone.

10 retrieved papers
Relational cache scheduling (RCS)

RCS is a dynamic caching strategy that determines when to perform full computations by estimating output prediction errors from input feature prediction errors. This adaptive scheduling reduces cache errors by performing full computations only when necessary, improving both quality and efficiency.

4 retrieved papers
Relational feature caching (RFC) framework

RFC is a comprehensive framework that combines RFE and RCS to accelerate diffusion transformers by exploiting the relationship between input and output features. The framework consistently outperforms prior caching approaches across various DiT models and generative tasks.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Relational feature estimation (RFE)

RFE is a forecasting method that estimates the magnitude of changes in output features by leveraging the relationship between input and output feature variations. This approach addresses the irregular dynamics of feature changes across timesteps, improving prediction accuracy over temporal extrapolation techniques alone.

Contribution

Relational cache scheduling (RCS)

RCS is a dynamic caching strategy that determines when to perform full computations by estimating output prediction errors from input feature prediction errors. This adaptive scheduling reduces cache errors by performing full computations only when necessary, improving both quality and efficiency.

Contribution

Relational feature caching (RFC) framework

RFC is a comprehensive framework that combines RFE and RCS to accelerate diffusion transformers by exploiting the relationship between input and output features. The framework consistently outperforms prior caching approaches across various DiT models and generative tasks.

Relational Feature Caching for Accelerating Diffusion Transformers | Novelty Validation